Methods of incorporating amino acid analogs into proteins

ABSTRACT

The invention provides a method of incorporating nonstandard amino acids into a protein by utilizing a modified aminoacyl-tRNA synthetase to charge the nonstandard amino acid to a modified tRNA, which forms strict Watson-Crick base-pairing with a codon that normally forms wobble base-pairing with natural tRNAs.

STATEMENT OF GOVERNMENT SUPPORT

This invention was made with federal government support under grantnumber GM62523 awarded by the NIH, and under NSF DMR-0080065 awarded bythe NSF. The United States government has certain rights in theinvention.

REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.11/130,583, filed May 17, 2005, now pending, which claims the benefit ofthe filing date of U.S. Provisional Application 60/571,810, filed on May17, 2004, the entire content of which is incorporated herein byreference.

BACKGROUND OF THE INVENTION

Protein engineering is a powerful tool for modification of thestructural catalytic and binding properties of natural proteins and forthe de novo design of artificial proteins. Protein engineering relies onan efficient recognition mechanism for incorporating mutant amino acidsin the desired protein sequences. Though this process has been veryuseful for designing new macromolecules with precise control ofcomposition and architecture, a major limitation is that the mutagenesisis restricted to the 20 naturally occurring amino acids. However, it isbecoming increasingly clear that incorporation of unnatural amino acidscan extend the scope and impact of protein engineering methods. Thus,for many applications of designed macromolecules, it would be desirableto develop methods for incorporating amino acids that have novelchemical functionality not possessed by the 20 amino acids commonlyfound in naturally occurring proteins. That is, ideally, one would liketo tailor changes in a protein (the size, acidity, nucleophilicity,hydrogen-bonding or hydrophobic properties, etc. of amino acids) tofulfill a specific structural or functional property of interest. Theability to incorporate such amino acid analogs into proteins wouldgreatly expand our ability to rationally and systematically manipulatethe structures of proteins, both to probe protein function and createproteins with new properties. For example, the ability to synthesizelarge quantities of proteins containing heavy atoms would facilitateprotein structure determination, and the ability to site specificallysubstitute fluorophores or photo-cleavable groups into proteins inliving cells would provide powerful tools for studying protein functionsin vivo. One might also be able to enhance the properties of proteins byproviding building blocks with new functional groups, such as an aminoacid containing a keto-group.

Incorporation of novel amino acids in macromolecules has been successfulto an extent. Biosynthetic assimilation of non-canonical amino acidsinto proteins has been achieved largely by exploiting the capacity ofthe wild type synthesis apparatus to utilize analogs of naturallyoccurring amino acids (Budisa 1995, Eur. J. Biochem 230: 788-796; Deming1997, J. Macromol. Sci. Pure Appl. Chem. A34; 2143-2150; Duewel 1997,Biochemistry 36: 3404-3416; van Hest and Tirrell 1998, FEBS Lett428(1-2): 68-70; Sharma et al., 2000, FEBS Lett 467(1): 37-40).Nevertheless, the number of amino acids shown conclusively to exhibittranslational activity in vivo is small, and the chemical functionalitythat has been accessed by this method remains modest. In designingmacromolecules with desired properties, this poses a limitation sincesuch designs may require incorporation of complex analogs that differsignificantly from the natural substrates in terms of both size andchemical properties and hence, are unable to circumvent the specificityof the synthetases. Thus, there is a need to develop a method to furtherexpand the range of unnatural amino acids that can be incorporated.

In recent years, several laboratories have pursued an expansion in thenumber of genetically encoded amino acids, by using either a nonsensesuppressor or a frame-shift suppressor tRNA to incorporate non-canonicalamino acids into proteins in response to amber or four-base codons,respectively (Bain et al., J. Am. Chem. Soc. 111: 8013, 1989; Noren etal., Science 244: 182, 1989; Furter, Protein Sci. 7: 419, 1998; Wang etal., Proc. Natl. Acad. Sci. U.S.A., 100: 56, 2003; Hohsaka et al., FEBSLett. 344:171:1994; Kowal and Oliver, Nucleic Acids Res. 25: 4685,1997). Such methods insert non-canonical amino acids at codon positionsthat will normally terminate wild-type peptide synthesis (e.g. a stopcodon or a frame-shift mutation). These methods have worked well forsingle-site insertion of novel amino acids. However, their utility inmultisite incorporation is limited by modest (20-60%) suppressionefficiencies (Anderson et al., J. Am. Chem. Soc. 124: 9674, 2002; Bainet al., Nature 356: 537, 1992; Hohsaka et al., Nucleic Acids Res. 29:3646, 2001). This is so partially because too high a stop codonsuppression efficiency will interfere with the normal translationtermination of some non-targeted proteins in the organism. On the otherhand, a low suppression efficiency will likely be insufficient tosuppress more than one nonsense or frame-shift mutation sites in thetarget protein, such that it becomes more and more difficult orimpractical to synthesize a full-length target protein incorporatingmore and more non-canonical amino acids.

Efficient multisite incorporation has been accomplished by replacementof natural amino acids in auxotrophic Escherichia coli strains, and byusing aminoacyl-tRNA synthetases with relaxed substrate specificity orattenuated editing activity (Wilson and Hatfield, Biochim. Biophys. Acta781: 205, 1984; Kast and Hennecke, J. Mol. Biol. 222: 99, 1991; Ibba etal., Biochemistry 33: 7107, 1994; Sharma et al., FEBS Lett. 467: 37,2000; Tang and Tirrell, Biochemistry 41: 10635, 2002; Datta et al., J.Am. Chem. Soc. 124: 5652, 2002; Doring et al., Science 292: 501, 2001).Although this method provides efficient incorporation of analogues atmultiple sites, it suffers from the limitation that the novel amino acidmust “share” codons with one of the natural amino acids. Thus for anygiven codon position where both natural and novel amino acids can beinserted, other than a probability of incorporation, there is relativelylittle control over which amino acid will end up being inserted. Thismay be undesirable, since for an engineered enzyme or protein,non-canonical amino acid incorporation at an unintended site mayunexpectedly compromise the function of the protein, while missingincorporating the non-canonical amino acid at the designed site willfail to achieve the design goal.

The invention provides a new technique for the incorporation ofnon-standard/non-canonical amino acids into proteins that is based onbreaking the degeneracy of the genetic code.

SUMMARY OF THE INVENTION

The present invention provides compositions of components used inprotein biosynthetic machinery, which include orthogonaltRNA/aminoacyl-tRNA synthetase (AARS) pairs and the individualcomponents of the pairs. Methods for generating and selecting orthogonaltRNAs, orthogonal aminoacyl-tRNA synthetases, and pairs thereof that canuse an unnatural amino acid are also provided. Compositions of theinvention include novel orthogonal tRNA/aminoacyl-tRNA synthetase pairs.The novel orthogonal pairs can be use to incorporate an unnatural aminoacid in a polypeptide in vitro and in vivo. Other embodiments of theinvention include selecting orthogonal pairs.

Compositions of the present invention include an orthogonalaminoacyl-tRNA synthetase (O-RS), where the O-RS preferentiallyaminoacylates an orthogonal tRNA (O-tRNA) with an unnatural amino acid,optionally, in vivo. In one embodiment, the invention provides a nucleicacid encoding an O-RS, or a complementary nucleic acid sequence thereof.In another embodiment, the O-RS has improved or enhanced enzymaticproperties, e.g., the K_(m) is higher or lower, the k_(cat) is higher orlower, the value of k_(cat)/K_(m) is higher or lower or the like, forthe unnatural amino acid compared to a naturally occurring amino acid,e.g., one of the 20 known amino acids.

Thus one aspect of the invention relates to a polynucleotide encoding amodified tRNA of a tRNA for a natural amino acid, wherein the naturalamino acid is encoded by one or more wobble degenerate codon(s), themodified tRNA comprises a modified anticodon sequence that formsWatson-Crick base-pairing with one of the wobble degenerate codon(s).Preferably, the modified tRNA is not or only inefficiently charged by anendogenous aminoacyl-tRNA synthetase (AARS) for the natural amino acid.

In certain embodiments, the modified tRNA interacts with the wobbledegenerate codon with an affinity at 37° C. of at least about 1.0kcal/mole, or 1.5 kcal/mole, or even 2.0 kcal/mole more favorably thanthe interaction between its unmodified version and the wobble degeneratecodon.

In certain embodiments, the modified tRNA can be efficiently charged tocarry an analog of the natural amino acid (e.g. the unnatural aminoacid).

In certain embodiments, the unnatural amino acid is a derivative of atleast one of the 20 natural amino acids, with one or more functionalgroups not present in natural amino acids.

In certain embodiments, the functional group is selected from the groupconsisting of: bromo-, iodo-, ethynyl-, cyano-, azido-, acetyl, arylketone, a photolabile group, a fluorescent group, and a heavy metal.

In certain embodiments, the unnatural amino acid is any one of thosedescribed herein or known in the art, such as any one in FIGS. 29, 30,and 31 of US 2003/0108885 A1 (entire content incorporated herein byreference).

In certain embodiments, the amino acid analog is a derivative of Phe,such as Nal.

In certain embodiments, the amino acid analog is a derivative of Trp,such as 6-bromo-L-tryptophan, 6-chloro-L-tryptophan, orBenzothienyl-L-alanine (Sulfur instead of Nitrogen in tryptophan).

In certain embodiments, the modified tRNA, when charged with theunnatural amino acid, can be incorporated by a translation system into apolypeptide comprising the wobble degenerate codon.

In certain embodiments, the modified AARS with relaxed substratespecificity charges the modified tRNA with the unnatural amino acid.

In certain embodiments, the specificity constant (k_(cat)/K_(M)) foractivation of the unnatural amino acid by the modified AARS is at least5-fold larger than that for the natural amino acid.

In certain embodiments, the tRNA is tRNA^(Phe), the degenerate codon isUUU, and the analog is L-3-(2-naphthyl)alanine (Nal).

In certain embodiments, the modified tRNA further comprises a mutationat the fourth, extended anticodon site for increase translationalefficiency.

In certain embodiments, the modified tRNA is charged by the endogenousAARS at a rate no more than 1% of that of the tRNA.

Another aspect of the invention relates to a modified tRNA encoded byany one of the subject polynucleotides, such as those described above.

Another aspect of the invention relates to a method for incorporating anunnatural amino acid into a target protein at one or more specifiedpositions, the method comprising: (1) providing to a translation systema first polynucleotide of the subject invention or a subject modifiedtRNA; (2) providing to the translation system a second polynucleotideencoding a modified AARS with relaxed substrate specificity, or themodified AARS, wherein the modified AARS is capable of charging themodified tRNA with the unnatural amino acid; (3) providing to thetranslation system the unnatural amino acid; (4) providing a templatepolynucleotide encoding the target protein, wherein the codon on thetemplate polynucleotide for the specified position(s) only formsWatson-Crick base-pairing with the modified tRNA; and, (5) allowingtranslation of the template polynucleotide to proceed, therebyincorporating the unnatural amino acid into the target protein at thespecified position(s), wherein steps (1)-(4) are effectuated in anyorder.

In certain embodiments, the translation system is an in vitrotranslation system, such as Wheat Germ Lysate-based IVT system, an E.coli system for coupled in vitro transcription/translation; or a rabbitreticulocyte lysate-based IVT system.

In certain embodiments, the translation system is a cell.

In certain embodiments, step (3) is effectuated by contacting the cellwith a solution containing the unnatural amino acid.

In certain embodiments, the unnatural amino acid is an analog of thenatural amino acid.

In certain embodiments, the unnatural amino acid is an analog of atleast one amino acid different from the natural amino acid.

In certain embodiments, the unnatural amino acid is not an analog of anynatural amino acids.

In certain embodiments, the unnatural amino acid comprises a side-chainR group selected from: alkyl-, aryl-, acyl-, keto-, azido-, hydroxyl-,hydrazine, cyano-, halo-, hydrazide, alkenyl, alkynl, ether, thiol,seleno-, sulfonyl-, borate, boronate, phospho, phosphono, phosphine,heterocyclic, enone, imine, aldehyde, ester, thioacid, hydroxylamine,amino group, or the like or any combination thereof.

In certain embodiments, the unnatural amino acid comprises aphotoactivatable cross-linker, or is a spin-labeled amino acid,fluorescent amino acid, a metal-binding amino acid, a metal-containingamino acid, a radioactive amino acid, an amino acid with novelfunctional group(s), an amino acid that covalently or noncovalentlyinteracts with other molecules, a photocaged and/or photoisomerizableamino acid, an amino acids comprising biotin or a biotin analog, aglycosylated amino acid comprising a sugar-substituted serine, acarbohydrate-modified amino acid, a keto-containing amino acid, an aminoacid comprising polyethylene glycol or polyether, heavy atom-substitutedamino acid, a chemically cleavable and/or photocleavable amino acid, anamino acids with an elongated side-chain as compared to natural aminoacids (e.g., polyethers or long chain hydrocarbons, e.g., greater thanabout 5 or greater than about 10 carbons), a carbon-linkedsugar-containing amino acid, a redox-active amino acid, an aminothioacid-containing amino acid, or an amino acid comprising one or moretoxic moiety.

In certain embodiments, the unnatural amino acid is represented byFormula II or III:

-   -   wherein    -   Z comprises —OH, —NH₂, —SH, —NH—R′, or S—R′;    -   X and Y, which may be the same or different, comprise S or O,        and    -   R and R′, which may be the same or different, are selected from:        alkyl-, aryl-, acyl-, keto-, azido-, hydroxyl-, hydrazine,        cyano-, halo-, hydrazide, alkenyl, alkynl, ether, thiol,        seleno-, sulfonyl-, borate, boronate, phospho, phosphono,        phosphine, heterocyclic, enone, imine, aldehyde, ester,        thioacid, hydrogen, hydroxylamine, amino group, or the like or        any combination thereof;    -   or is selected from: α-hydroxy acids, α-thioacids        α-aminothiocarboxylates (e.g., with side chains corresponding to        the 20 natural amino acids or unnatural side chains).

In certain embodiments, the unnatural amino acid is L, D, orα-α-disubstituted amino acid selected from D-glutamate, D-alanine,D-methyl-O-tyrosine, or aminobutyric acid.

In certain embodiments, the unnatural amino acid comprises a functionalgroup selected from: bromo-, iodo-, ethynyl-, cyano-, azido-, acetyl,aryl ketone, photolabile, fluorescent, or heavy metal group.

In certain embodiments, the unnatural amino acid is a cyclic amino acidselected from: a 3-, 4-, 6-, 7-, 8-, and 9-membered ring proline analog;a β or γ amino acid selected from substituted β-alanine or γ-aminobutyric acid.

In certain embodiments, the unnatural amino acid is a Tyrosine analogselected from: a para-substituted tyrosine, an ortho-substitutedtyrosine, a meta-substituted tyrosine, wherein the substituted tyrosinecomprises an acetyl group, a benzoyl group, an amino group, a hydrazine,an hydroxyamine, a thiol group, a carboxy group, an isopropyl group, amethyl group, a C6-C20 straight chain or branched hydrocarbon, asaturated or unsaturated hydrocarbon, an O-methyl group, a polyethergroup, a nitro group, or multiply substituted aryl rings; a Glutamineanalog selected from: α-hydroxy derivatives, β-substituted derivatives,cyclic derivatives, or amide-substituted glutamine derivatives; aPhenylalanine analog selected from: meta-substituted phenylalanines,wherein the substituent comprises a hydroxy group, a methoxy group, amethyl group, an allyl group, an acetyl group, or the like.

In certain embodiments, the unnatural amino acid is anO-methyl-L-tyrosine, an L-3-(2-naphthyl)alanine, a3-methyl-phenylalanine, an O-4-allyl-L-tyrosine, a 4-propyl-L-tyrosine,a tri-O-acetyl-GlcNAcβ-serine, an L-Dopa, a fluorinated phenylalanine,an isopropyl-L-phenylalanine, a p-azido-L-phenylalanine, ap-acyl-L-phenylalanine, a p-benzoyl-L-phenylalanine, an L-phosphoserine,a phosphonoserine, a phosphonotyrosine, a p-iodo-phenylalanine, ap-bromophenylalanine, a p-amino-L-phenylalanine, or anisopropyl-L-phenylalanine.

In certain embodiments, the unnatural amino acid modifies one or morebiological properties of a protein into which it is incorporated, thebiological properties comprising: toxicity, biodistribution, solubility,thermal stability, hydrolytic stability, oxidative stability, resistanceto enzymatic degradation, facility of purification and processing,structural properties, spectroscopic properties, chemical and/orphotochemical properties, catalytic activity, redox potential,half-life, ability to react with other molecules either covalently ornoncovalently.

In certain embodiments, the modified tRNA can be charged to carry theunnatural amino acid by the modified AARS with relaxed substratespecificity.

In certain embodiments, the specificity constant (k_(cat)/K_(M)) foractivation of the unnatural amino acid by the modified AARS is at least5-fold larger than that for the natural amino acid.

In certain embodiments, the modified tRNA is charged by an endogenousAARS at a rate no more than 1% of that of its cognate tRNA.

In certain embodiments, the unnatural amino acid is provided byintroducing additional nucleic acid construct(s) into the translationsystem, wherein the additional nucleic acid construct(s) encode one ormore proteins required for biosynthesis of the unnatural amino acid.

In certain embodiments, at least one of the additional nucleic acidconstruct(s) is operably linked to and subject to the control of aninducible promoter.

In certain embodiments, the first and the second polynucleotides arepresent on the same molecule.

In certain embodiments, the first and second polynucleotides are encodedby a plasmid or plasmids.

In certain embodiments, the plasmid or plasmids have a selectablemarker.

In certain embodiments, the selectable marker is an antibioticresistance gene.

In certain embodiments, the first polynucleotide further comprises afirst promoter sequence controlling the expression of the modified tRNA.

In certain embodiments, the first promoter is an inducible promoter.

In certain embodiments, the second polynucleotide further comprises asecond promoter sequence controlling the expression of the modifiedAARS.

In certain embodiments, the cell is auxotrophic for the natural aminoacid encoded at the specified position.

In certain embodiments, the translation system lacks endogenous tRNAthat forms Watson-Crick base-pairing with the codon at the specifiedposition.

In certain embodiments, the translation system is a cell, and the methodfurther comprises disabling one or more genes encoding any endogenoustRNA that forms Watson-Crick base-pairing with the codon at thespecified position(s).

In certain embodiments, the translation system is a cell, and the methodfurther comprises inhibiting one or more endogenous AARS that chargestRNAs that form Watson-Crick base-pairing with the codon.

In certain embodiments, the cell is a bacterial cell., such as an E.coli cell.

In certain embodiments, the cell is an insect cell.

In certain embodiments, the cell is a mammalian cell.

In certain embodiments, the cell is a fungal cell, such as a yeast cell.

In certain embodiments, the modified tRNA and/or the modified AARS arederived from a organism different from that of the cell.

In certain embodiments, the method further comprises verifying theincorporation of the analog. For example, the incorporation of theanalog can be verified by mass spectrometry.

In certain embodiments, the analog is incorporated into the position atan efficiency of at least about 50%.

Another aspect of the invention provides a translation system comprisingthe polynucleotide of the subject invention.

In certain embodiments, the translation system further comprises asecond polynucleotide encoding a modified AARS with relaxed substratespecificity, or the modified AARS, wherein the modified AARS is capableof charging the modified tRNA with an unnatural amino acid.

In certain embodiments, the translation system comprises more than twodifferent subject polynucleotides, each of the polynucleotides capableof carrying a different unnatural amino acid.

In certain embodiments, the translation system is a cell.

In certain embodiments, the modified tRNA is from an organism differentfrom that of the cell.

In certain embodiments, the modified tRNA is from a yeast, and the cellis an E. coli bacterium.

In certain embodiments, the modified AARS and the tRNA are from the sameorganism, and the organism is different from that of the cell.

In certain embodiments, the modified AARS and the tRNA are from a yeast,and the cell is an E. coli bacterium.

In certain embodiments, the expression and/or function of an endogenoustRNA homologous to the tRNA is impaired or abolished.

In certain embodiments, the expression of the endogenous tRNA isimpaired/abolished by inhibiting the function of the endogenous tRNA'scognate AARS, thereby impairing/abolishing the charging of theendogenous tRNA.

In certain embodiments, the expression of the endogenous tRNA isabolished by deleting the gene encoding the endogenous tRNA.

Another aspect of the invention provides a vector comprising the subjectpolynucleotides.

In certain embodiments, the polynucleotide is operably linked to, andunder the transcription control of a promoter.

In certain embodiments, the promoter is an inducible promoter.

In certain embodiments, the vector is an expression vector suitable forexpressing the polynucleotide in a eukaryotic and/or a prokaryotic cell.

Another aspect of the invention provides a method for PEGylating apolypeptide, comprising: (1) incorporating one or more unnatural aminoacid(s) at specified position(s) of the polypeptide using any of thesuitable subject methods, wherein the unnatural amino acid(s) serves assite-specific PEGylations sites; (2) PEGylating the polypeptide.

In certain embodiments, the unnatural amino acid does not containprimary amine or thiol side-chain group.

In certain embodiments, the unnatural amino acid is linked to PEGmoieties through a triazole linkage.

In certain embodiments, the triazole linkage is formed bycopper-mediated Huisgen [3+2] cycloaddition of an azide and an alkyne.

In certain embodiments, the azide group is provided bypara-azidophenylalanine, and the alkyne group is provided by an alkynederivatized PEG reagent.

In certain embodiments, the polypeptide, when PEGylated, has one or moreof: longer half life, sustained or enhanced biological activity, ishomogeneously modified, increased potency and stability and/or decreasedimmunogenicity, consistency in biological activities from lot to lot.

Another aspect of the invention provides a PEGylated polypeptideproduced by any of the subject methods.

Another aspect of the invention provides a method for enhancinghalf-life of a cytokine or a growth factor, comprising incorporating oneor more unnatural amino acid(s) at specified position(s) of thepolypeptide using any of the suitable subject methods, wherein theunnatural amino acid(s) reduces binding affinity of the cytokine orgrowth factor to its receptor in endosomes, thereby increasing thehalf-life of the cytokine or growth factor.

In certain embodiments, the unnatural amino acid changes protonationstates between cell-surface and endosomal pH.

Another aspect of the invention provides a cytokine or a growth factorproduced by the suitable subject methods.

Another aspect of the invention provides a method for glycosylating apolypeptide, comprising: (1) incorporating one or more unnatural aminoacid(s) at specified position(s) of the polypeptide using any of thesuitable subject methods, wherein the unnatural amino acid(s) serves assite-specific glycosylation site; (2) contacting the polypeptide with asaccharide moiety to form a covalent bond that attaches the saccharidemoiety to the unnatural amino acid of the protein.

In certain embodiments, the unnatural amino acid comprises a firstreactive group; and the saccharide moiety comprises a second reactivegroup, wherein the first reactive group reacts with the second reactivegroup in (2).

In certain embodiments, the first reactive group is an electrophilic ornucleophilic moiety, and the second reactive group is a nucleophilic orelectrophilic moiety, respectively.

In certain embodiments, the electrophilic moiety is a carbonyl group, asulfonyl group, an aldehyde group, a ketone group, a hindered estergroup, a thioester group, a stable imine group, an epoxide group, or anaziridine group.

In certain embodiments, the nucleophilc moiety includes: an aliphatic oraromatic amine, ethylenediamine, —NR1-NH2 (hydrazide), —NR1(C═O)NR2NH2(semicarbazide), —NR1(C═S)NR2NH2 (thiosemicarbazide), —(C═O)NR1NH2(carbonylhydrazide), —(C═S)NR1NH2 (thiocarbonylhydrazide), —(SO2)NR1NH2(sulfonylhydrazide), —NR1NR2(C═O)NR3NH2 (carbazide), NR1NR2(C═S)NR3NH2(thiocarbazide), —O—NH2 (hydroxylamine), where each R1, R2, and R3 isindependently H, or alkyl having 1-6 carbons.

In certain embodiments, the saccharide moiety includes a singlecarbohydrate moiety, or two or more carbohydrate moieties.

In certain embodiments, the method further comprises contacting thesaccharide moiety with one or more glycosyl transferase(s), a sugardonor moiety, and other reactants required for glycosyl transferaseactivity for a sufficient time and under appropriate conditions totransfer a sugar from the sugar donor moiety to the saccharide moiety.

In certain embodiments, the glycosyl transferase(s) comprises one ormore of: a β1-4N-acetylglucosaminyl transferase, an α1,3-fucosyltransferase, an α1,2-fucosyl transferase, an α1,4-fucosyl transferase, aβ1-4-galactosyl transferase, or a sialyl transferase.

In certain embodiments, the saccharide moiety comprises a terminalGlcNAc, the sugar donor moiety is UDP-Gal, and the glycosyl transferaseis a β-1,4-galactosyl transferase.

In certain embodiments, the saccharide moiety comprises a terminalGlcNAc, the sugar donor moiety is UDP-GlcNAc and the glycosyltransferase is a β1-4N-acetylglucosaminyl transferase.

In certain embodiments, the first and second reactive groups produce areaction product comprising an oxime, an amide, a hydrazone, a reducedhydrazone, a carbohydrazone, a thiocarbohydrazone, a sulfonylhydrazone,a semicarbazone, or a thiosemicarbazone.

In certain embodiments, the polypeptide is a therapeutic, diagnostic, orother protein selected from: Alpha-1 antitrypsin, Angiostatin,Antihemolytic factor, antibodies, Apolipoprotein, Apoprotein, Atrialnatriuretic factor, Atrial natriuretic polypeptide, Atrial peptides,C-X-C chemokines (e.g., T39765, NAP-2, ENA-78, Gro-a, Gro-b, Gro-c,IP-10, GCP-2, NAP-4, SDF-1, PF4, MIG), Calcitonin, CC chemokines (e.g.,Monocyte chemoattractant protein-1, Monocyte chemoattractant protein-2,Monocyte chemoattractant protein-3, Monocyte inflammatory protein-1alpha, Monocyte inflammatory protein-1 beta, RANTES, I309, R83915,R91733, HCC1, T58847, D31065, T64262), CD40 ligand, C-kit Ligand,Collagen, Colony stimulating factor (CSF), Complement factor 5a,Complement inhibitor, Complement receptor 1, cytokines, (e.g.,epithelial Neutrophil Activating Peptide-78, GROα/MGSA, GROβ, GROγ,MIP-1α, MIP-1δ, MCP-1), Epidermal Growth Factor (EGF), Erythropoietin,Exfoliating toxins A and B, Factor IX, Factor VII, Factor VIII, FactorX, Fibroblast Growth Factor (FGF), Fibrinogen, Fibronectin, G-CSF,GM-CSF, Glucocerebrosidase, Gonadotropin, growth factors, Hedgehogproteins (e.g., Sonic, Indian, Desert), Hemoglobin, Hepatocyte GrowthFactor (HGF), Hirudin, Human serum albumin, Insulin, Insulin-like GrowthFactor (IGF), interferons (e.g., IFN-α, IFN-β, IFN-γ), interleukins(e.g., IL-1, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10,IL-11, IL-12, etc.), Keratinocyte Growth Factor (KGF), Lactoferrin,leukemia inhibitory factor, Luciferase, Neurturin, Neutrophil inhibitoryfactor (NIF), oncostatin M, Osteogenic protein, Parathyroid hormone,PD-ECSF, PDGF, peptide hormones (e.g., Human Growth Hormone),Pleiotropin, Protein A, Protein G, Pyrogenic exotoxins A, B, and C,Relaxin, Renin, SCF, Soluble complement receptor I, Soluble I-CAM 1,Soluble interleukin receptors (IL-1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12,13, 14, 15), Soluble TNF receptor, Somatomedin, Somatostatin,Somatotropin, Streptokinase, Superantigens, i.e., Staphylococcalenterotoxins (SEA, SEB, SEC1, SEC2, SEC3, SED, SEE), Superoxidedismutase (SOD), Toxic shock syndrome toxin (TSST-1), Thymosin alpha 1,Tissue plasminogen activator, Tumor necrosis factor beta (TNF beta),Tumor necrosis factor receptor (TNFR), Tumor necrosis factor-alpha (TNFalpha), Vascular Endothelial Growth Factor (VEGEF), Urokinase; atranscriptional modulator that modulates cell growth, differentiation,or regulation, wherein the transcriptional modulator is fromprokaryotes, viruses, or eukaryotes, including fungi, plants, yeasts,insects, and animals, including mammals; expression activator selectedfrom cytokines, inflammatory molecules, growth factors, their receptors,oncogene products, interleukins (e.g., IL-1, IL-2, IL-8, etc.),interferons, FGF, IGF-I, IGF-II, FGF, PDGF, TNF, TGF-α, TGF-β, EGF, KGF,SCF/c-Kit, CD40L/CD40, VLA-4/VCAM-1, ICAM-1/LFA-1, and hyalurin/CD44;signal transduction molecules and corresponding oncogene products, e.g.,Mos, Ras, Raf, and Met; transcriptional activators and suppressors,e.g., p53, Tat, Fos, Myc, Jun, Myb, Rel; steroid hormone receptorsselected from receptors for estrogen, progesterone, testosterone,aldosterone, LDL, or corticosterone; or an enzyme selected from:amidases, amino acid racemases, acylases, dehalogenases, dioxygenases,diarylpropane peroxidases, epimerases, epoxide hydrolases, esterases,isomerases, kinases, glucose isomerases, glycosidases, glycosyltransferases, haloperoxidases, monooxygenases (e.g., p450s), lipases,lignin peroxidases, nitrile hydratases, nitrilases, proteases,phosphatases, subtilisins, transaminase, or nucleases.

Another aspect of the invention provides a glycosylated polypeptideproduced by any of the suitable subject methods.

Another aspect of the invention provides a method for generating animmunoconjugate comprising an antibody (or functionalfragment/derivative thereof) and one or more therapeutic moieties, themethod comprising: (1) incorporating one or more unnatural amino acid(s)at specified position(s) of the antibody using any of the suitablesubject methods; (2) contacting the antibody with the one or moretherapeutic moieties to form a conjugate that attaches the one or moretherapeutic moieties to the unnatural amino acid(s) of the antibody.

In certain embodiments, the therapeutic moieties are different.

In certain embodiments, the therapeutic moieties are conjugated to thesame unnatural amino acids.

In certain embodiments, the therapeutic moieties are conjugated todifferent unnatural amino acids.

In certain embodiments, the therapeutic moieties are cleavable under oneor more conditions selected from: mild or weak acidic conditions (e.g.,about pH 4-6, preferably about pH5), reductive environment (e.g., thepresence of a reducing agent), divalent cations, or (optionally) heat.

Another aspect of the invention provides an immunoconjugate produced byany of the suitable subject methods.

Another aspect of the invention provides a method for immobilizing oneor more polypeptide(s) to an array, the method comprising: (1)incorporating one or more unnatural amino acid(s) at specifiedposition(s) of the polypeptide(s) using any of the suitable methods; (2)contacting the polypeptide(s) with a solid support to conjugate thepolypeptide(s) through the unnatural amino acid(s).

In certain embodiments, the one or more polypeptides are attached to thesolid support in a consistent orientation.

In certain embodiments, the active site(s) of each polypeptide(s) areaccessible to potentially interacting molecules.

Another aspect of the invention provides a polypeptide array produced byany of the suitable subject methods.

All embodiments described above and those in other parts of thespecification are contemplated to be able to freely combine with one ormore other embodiments, even for those embodiments described underseparate aspects of the invention, unless such combinations arespecifically excluded or would contradict the general principles and/orteachings of the instant specification.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows a schematic diagram for multiple-site-specificincorporation of unnatural amino acid into the UUU codon.

FIG. 2 shows the incorporation (or lack thereof) of Nal in place of Phein several tryptic fragments of mDHFR, in response to the UUU codon.These data unambiguously establish that Nal incorporation iscodon-biased to UUU.

FIG. 3 shows a schematic diagram for multiple-site-specificincorporation of unnatural amino acid into the UUG codon.

FIG. 4 demonstrates the replacement of Leu by Nal as detected in MALDImass spectra of tryptic fragments of mDHFR.

FIG. 5 shows the effect of AZL on replacement of Leu by Nal as evaluatedby MALDI mass spectra of tryptic fragments of mDHFR.

DETAILED DESCRIPTION OF THE INVENTION I. Overview

Proteins are at the crossroads of virtually every biological process,from photosynthesis and vision to signal transduction and the immuneresponse. These complex functions result from a polyamide based polymerconsisting of twenty relatively simple building blocks arranged in adefined primary sequence.

The present invention includes methods and composition for use in thesite-specific incorporation of unnatural amino acids directly intoproteins in vivo. Importantly, the unnatural amino acid is added to thegenetic repertoire, rather than substituting for one of the common 20amino acids. The present invention provides methods for generating,methods for identifying and compositions comprising the components usedby the biosynthetic machinery to incorporate an unnatural amino acidinto a protein.

The present invention, e.g., (i) allows the site-selective insertion ofone or more unnatural amino acids at any desired position of anyprotein, (ii) is applicable to both prokaryotic and eukaryotic cells,(iii) enables in vivo studies of mutant proteins in addition to thegeneration of large quantities of purified mutant proteins, and (iv) isadaptable to incorporate any of a large variety of unnatural aminoacids, into proteins in vivo. Thus, in a specific polypeptide sequence anumber of different site-selective insertions of unnatural amino acidsis possible. Such insertions are optionally all of the same type (e.g.,multiple examples of one type of unnatural amino acid inserted atmultiple points in a polypeptide) or are optionally of diverse types(e.g., different unnatural amino acid types are inserted at multiplepoints in a polypeptide).

The invention provides methods and reagents for incorporating amino acidanalogs into a target protein. The modified target proteins thusproduced are useful for discovery of potentially useful therapeuticmolecules, biomaterials, and other proteins of interest. Such proteinsalso are useful for functional and structural studies of proteins aswell as for biochemical study of the translation system.

One aspect of the invention provides a polynucleotide encoding amodified tRNA based on a wild-type tRNA for a natural amino acid.

In certain embodiments, the natural amino acid is encoded by two or moregenetic codes (thus encoded by degenerate genetic codes). In most, ifnot all cases, this includes 18 of the 20 natural amino acids, exceptMet and Trp. In these circumstances, to recognize all the degenerategenetic codes for the natural amino acid, the anticodon loop of thewild-type tRNA(s) relies on both wobble base-pairing and pureWatson-Crick base-pairing. The subject modified tRNA contains at leastone modification in its anticodon loop, such that the modified anticodonloop now forms Watson-Crick base-pairing to one of the degenerategenetic codes, which the tRNA previously bind only through wobblebase-pairing (see Example I below).

Since Watson-Crick base pairing is invariably stronger and more stablethan wobble base pairing, the subject modified tRNA will preferentiallybind to a previous wobble base-pairing genetic code (now throughWatson-Crick base-pairing), over a previous Watson-Crick base-pairing(now through wobble base-pairing). Thus an analog may be incorporated atthe subject codon, if the modified tRNA is charged with an analog of anatural amino acid, which may or may not be the same as the naturalamino acid encoded by the codon in question.

For example, in Example II below, some Phe in mouse DHFR (mDHFR) areencoded by UUC codons, some others by UUU codons. The wild-type E. colitRNA for Phe has a GAA anticodon sequence, and thus binds the UUC codonsthrough Watson-Crick base-pairing, and binds the UUU codons throughwobble base-pairing. Thus in E. coli, a modified tRNA, such as a yeasttRNA for Phe may have a modified anticodon sequence of AAA, so that itnow preferentially binds to the previously “disfavored” UUU codons. Whensuch a modified Phe tRNA is charged with Nal, it competes with thewild-type Phe tRNA charged with Phe for the UUU codon. But since themodified tRNA binds UUU through the stronger Watson-Crick base-pairing,Nal (rather than Phe) will be preferentially, if not exclusively,inserted in the UUU codons.

In fact, the anticodon sequence of the modified tRNA may be changed insuch a way that it now recognizes a codon for a different natural aminoacid. For example, in Example III, the Phe tRNA anticodon sequence ischanged from GAA to CAA, which is capable of Watson-Crick base-pairingwith a Leu (rather than a Phe) codon UUG. Such a modified Phe tRNA cannow incorporate Nal into certain Leu codons.

Thus in certain embodiments, if it is desirable to incorporate certainamino acid analogs at codons for Met or Trp, a tRNA for a natural aminoacid (e.g., a Met tRNA, a Trp tRNA, or even a Phe tRNA, etc.) may bemodified to recognize the Met or Trp codon. Under this type of uniquesituation, both the modified tRNA and the natural tRNA compete to bindthe same (single) genetic code through Watson-Crick base-pairing. Somebut not all such codons will accept their natural amino acids, whileothers may accept amino acid analogs carried by the modified tRNA. Otherfactors, such as the abundance of the natural amino acid vs. that of theanalog, may affect the final outcome.

This also applies to other situations where a modified tRNA competeswith wild-type tRNA for any natural amino acids. Such modified tRNAs arewithin the scope of the instant invention.

In certain preferred embodiments, the modified tRNA is not charged oronly inefficiently charged by an endogenous aminoacyl-tRNA synthetase(AARS) for any natural amino acid, such that the modified tRNA largely(if not exclusively) carries an amino acid analog, but not a naturalamino acid. Although a subject modified tRNA may still be useful if itcan be charged by the endogenous AARS with a natural amino acid.

In certain embodiments, the modified tRNA charged with an amino acidanalog has such an overall shape and size that the analog-tRNA is aribosomally acceptable complex, that is, the tRNA-analog complex can beaccepted by the prokaryotic or eukaryotic ribosomes in an in vivo or invitro translation system.

In certain embodiments, the modified tRNA can be efficiently charged tocarry an analog of a natural amino acid. The amino acid analog may be aderivative of at least one of the 20 natural amino acids, with one ormore functional groups not present in natural amino acids. For example,the functional group may be selected from the group consisting of:bromo-, iodo-, ethynyl-, cyano-, azido-, acetyl, aryl ketone, aphotolabile group, a fluorescent group, and a heavy metal.

In one embodiment, the amino acid analog is a derivative of Phe, such asNal.

In certain embodiments, the modified tRNA can be charged to carry theanalog by a modified AARS with relaxed substrate specificity.

Preferably, the modified AARS specifically or preferentially charges theanalog to the modified tRNA over any natural amino acid. In a preferredembodiment, the specificity constant for activation of the analog by themodified AARS (defined as k_(cat)/K_(M)) is at least about 2-fold largerthan that for the natural amino acid, preferably about 3-fold, 4-fold,5-fold or more than that for the natural amino acid.

In a preferred embodiment, the tRNA is tRNA^(Phe), the degenerate codonis UUU, and the analog is L-3-(2-naphthyl)alanine (Nal).

In certain embodiments, the modified tRNA further comprises a mutationat the fourth, extended anticodon site for increase translationalefficiency.

In certain embodiments, the modified tRNA is charged by the endogenousAARS at a rate no more than about 50%, 30%, 20%, 10%, 5%, 2%, or 1% ofthat of the tRNA.

Another aspect of the invention provides a modified tRNA encoded by anyone of the subject polynucleotides.

Another aspect of the invention provides a method for incorporating anamino acid analog into a target protein at one or more specifiedpositions, the method comprising: (1) providing to an environment afirst subject polynucleotide for a modified tRNA, or a subject modifiedtRNA; (2) providing to the environment a second subject polynucleotideencoding a modified AARS with relaxed substrate specificity, or themodified AARS, wherein the modified AARS is capable of charging themodified tRNA with the analog; (3) providing to the environment theanalog; (4) providing a template polynucleotide encoding the targetprotein, wherein the codon on the template polynucleotide for thespecified position only forms Watson-Crick base-pairing with themodified tRNA; and, (5) allowing translation of the templatepolynucleotide to proceed, thereby incorporating the analog into thetarget protein at the specified position, wherein steps (1)-(4) areeffectuated in any order.

In certain embodiments, the methods of the invention involve introducinginto an environment (e.g., a cell or an in vitro translation system(IVT)) a first nucleic acid encoding an orthogonal/modified tRNAmolecule that is not charged efficiently by an endogenous aminoacyl-tRNAsynthetase in the cell/in vitro translation system (IVT), or theorthogonal/modified tRNA itself. The orthogonal/modified tRNA moleculehas an anticodon complementary to a degenerate codon sequence, which isone of a plurality of codon sequences encoding a naturally occurringamino acid. Such a codon is said to be degenerate. According to themethods of this embodiment of the invention, a second nucleic acidencoding an orthogonal/modified aminoacyl tRNA synthetase (AARS) is alsointroduced into the cell/IVT. The orthogonal/modified AARS is capable ofcharging the orthogonal/modified tRNA with a chosen amino acid analog.The amino acid analog can then be provided to the cell so that it can beincorporated into one or more proteins within the cell or IVT.

Thus in certain embodiments, the environment is an in vitro translationsystem. For example, suitable IVT systems include the Wheat GermLysate-based PROTEINscript-PRO™, Ambion's E. coli system for coupled invitro transcription/translation; or the rabbit reticulocyte lysate-basedRetic Lysate IVT™ Kit from Ambion). Optionally, the in vitro translationsystem can be selectively depleted of one or more natural AARSs (by, forexample, immunodepletion using immobilized antibodies against naturalAARS) and/or natural amino acids so that enhanced incorporation of theanalog can be achieved. Alternatively, nucleic acids encoding there-designed AARSs may be supplied in place of recombinantly producedAARSs. The in vitro translation system is also supplied with the analogsto be incorporated into mature protein products.

In other embodiments, the environment is a cell. A variety of cells (orlysates thereof suitable for IVT) can be used in the methods of theinvention, including, for example, a bacterial cell, a fungal cell, aninsect cell, and a mammalian cell (e.g. a human cell or a non-humanmammal cell). In one embodiment, the cell is an E. coli cell.

In certain embodiments, the amino acid analog can be provided bydirectly contacting the cell or IVT with the analog, for example, byapplying a solution of the analog to the cell in culture, or by directlyadding the analog to the IVT. The analog can also be provided byintroducing one or more additional nucleic acid construct(s) into thecell/IVT, wherein the additional nucleic acid construct(s) encodes oneor more amino acid analog synthesis proteins that are necessary forsynthesis of the desired analog.

In certain embodiments, the additional nucleic acid construct(s) has aninducible promoter sequence that can induce expression of the one ormore synthesis proteins.

The methods of this embodiment of the invention further involveintroducing a template nucleic acid construct into the cell/IVT, thetemplate encoding a protein, wherein the nucleic acid construct containsat least one degenerate codon sequence.

The nucleic acids introduced into the cell/IVT can be introduced as oneconstruct or as a plurality of constructs. In certain embodiments, thevarious nucleic acids are included in the same construct. For example,the nucleic acids can be introduced in any suitable vectors capable ofexpressing the encoded tRNA and/or proteins in the cell/IVT. In oneembodiment, the first and second nucleic acid sequences are provided inone or more plasmids. In another embodiment, the vector or vectors usedare viral vectors, including, for example, adenoviral and lentiviralvectors. The sequences can be introduced with an appropriate promotersequence for the cell/IVT, or multiple sequences that can be induciblefor controlling the expression of the sequences.

In certain embodiments, the plasmid or plasmids containing the subjectpolynucleotides have one or more selectable markers, such as antibioticresistance genes.

In certain embodiments, the first polynucleotide further comprises afirst promoter sequence controlling the expression of the modified tRNA.The first promoter is an inducible promoter.

In certain embodiments, the second polynucleotide further comprises asecond promoter sequence controlling the expression of the modifiedAARS.

In certain embodiments, the cell is auxotrophic for the amino acidnaturally encoded by the degenerate codon.

In certain embodiments, the cell is auxotrophic for the natural aminoacid encoded at the specified position.

In certain embodiments, the environment lacks endogenous tRNA that formsWatson-Crick base-pairing with the codon at the specified position.

When the cell has a tRNA that has an anticodon perfectly complementaryto the degenerate codon, the methods can include a step of disabling thegene encoding such an endogenous tRNA.

Alternatively, the environment is a cell, and the method furthercomprises inhibiting one or more endogenous AARS that charges tRNAs thatform Watson-Crick base-pairing with the codon.

In certain embodiments, the orthogonal tRNA and orthogonal aminoacyltRNA-synthetase can be derived from an organism from a different speciesthan that of the cell/the IVT. For example, a yeast tRNA and a yeastAARS may be used with an E. coli cell.

In certain embodiments, the method further comprises verifying theincorporation of the analog by, for example, mass spectrometry.

In certain embodiments, the method incorporates the analog into theposition at an efficiency of at least about 50%, or 60%, 70%, 80%, 90%,95%, 99% or nearly 100%.

II. Definitions

Before describing the present invention in detail, it is to beunderstood that this invention is not limited to particular compositionsor biological systems, which can, of course, vary. It is also to beunderstood that the terminology used herein is for the purpose ofdescribing particular illustrative embodiments only, and is not intendedto be limiting. As used in this specification and the appended claims,the singular forms “a,” “an,” and “the” include plural referents unlessthe content clearly dictates otherwise. Thus, for example, reference to“a molecule” optionally includes a combination of two or more suchmolecules, and the like.

Unless specifically defined below, the terms used in this specificationgenerally have their ordinary meanings in the art, within the generalcontext of this invention and in the specific context where each term isused. Certain terms are discussed below or elsewhere in thespecification, to provide additional guidance to the practitioner indescribing the compositions and methods of the invention and how to makeand use them. The scope an meaning of any use of a term will be apparentfrom the specific context in which the term is used.

“About” and “approximately” shall generally mean an acceptable degree oferror for the quantity measured given the nature or precision of themeasurements. Typical, exemplary degrees of error are within 20 percent(%), preferably within 10%, and more preferably within 5% of a givenvalue or range of values. Alternatively, and particularly in biologicalsystems, the terms “about” and “approximately” may mean values that arewithin an order of magnitude, preferably within 5-fold and morepreferably within 2-fold of a given value. Numerical quantities givenherein are approximate unless stated otherwise, meaning that the term“about” or “approximately” can be inferred when not expressly stated.

“Amino acid analog,” “non-canonical amino acid,” or “non-standard aminoacid,” used interchangeably, is meant to include all amino acid-likecompounds that are similar in structure and/or overall shape to one ormore of the twenty L-amino acids commonly found in naturally occurringproteins (Ala or A, Cys or C, Asp or D, Glu or E, Phe or F, Gly or G,His or H, Ile or I, Lys or K, Leu or L, Met or M, Asn or N, Pro or P,Gln or Q, Arg or R, Ser or S, Thr or T, Val or V, Trp or W, Tyr or Y, asdefined and listed in WIPO Standard ST.25 (1998), Appendix 2, Table 3).Amino acid analog can also be natural amino acids with modified sidechains or backbones. Preferably, these analogs usually are not“substrates” for the amino acyl tRNA synthethases (AARSs) because of thenormally high specificity of the AARSs. Although occasionally, certainanalogs with structures/shapes sufficiently close to those of naturalamino acids may be erroneously incorporated into proteins by AARSs,especially modified AARSs with relaxed substrate specificity. In apreferred embodiment, the analogs share backbone structures, and/or eventhe most side chain structures of one or more natural amino acids, withthe only difference(s) being containing one or more modified groups inthe molecule. Such modification may include, without limitation,substitution of an atom (such as N) for a related atom (such as S),addition of a group (such as methyl, or hydroxyl group, etc.) or an atom(such as Cl or Br, etc.), deletion of a group (supra), substitution of acovalent bond (single bond for double bond, etc.), or combinationsthereof. Amino acid analogs may include α-hydroxy acids, and β-aminoacids, and can also be referred to as “modified amino acids,” or“unnatural AARS substrates.”

The amino acid analogs may either be naturally occurring or unnaturallyoccurring (e.g. synthesized). As will be appreciated by those in theart, any structure for which a set of rotamers is known or can begenerated can be used as an amino acid analog. The side chains may be ineither the (R) or the (S) configuration (or D- or L-configuration). In apreferred embodiment, the amino acids are in the (S) or L-configuration.

Preferably, the overall shape and size of the amino acid analogs aresuch that, upon being charged to (natural or re-designed) tRNAs by(natural or re-designed) AARS, the analog-tRNA is a ribosomally acceptedcomplex, i.e., the tRNA-analog complex can be accepted by theprokaryotic or eukaryotic ribosomes in an in vivo or in vitrotranslation system.

“Achor residues” are residue positions in AARS that maintain criticalinteractions between the AARS and the natural amino acid backbone.

“Backbone,” or “template” includes the backbone atoms and any fixed sidechains (such as the anchor residue side chains) of the protein (e.g.,AARS). For calculation purposes, the backbone of an analog is treated aspart of the AARS backbone.

“Protein backbone structure” or grammatical equivalents herein is meantthe three dimensional coordinates that define the three dimensionalstructure of a particular protein. The structures which comprise aprotein backbone structure (of a naturally occurring protein) are thenitrogen, the carbonyl carbon, the α-carbon, and the carbonyl oxygen,along with the direction of the vector from the α-carbon to theβ-carbon.

The protein backbone structure which is input into the computer caneither include the coordinates for both the backbone and the amino acidside chains, or just the backbone, i.e. with the coordinates for theamino acid side chains removed. If the former is done, the side chainatoms of each amino acid of the protein structure may be “stripped” orremoved from the structure of a protein, as is known in the art, leavingonly the coordinates for the “backbone” atoms (the nitrogen, carbonylcarbon and oxygen, and the α-carbon, and the hydrogens attached to thenitrogen and α-carbon).

Optionally, the protein backbone structure may be altered prior to theanalysis outlined below. In this embodiment, the representation of thestarting protein backbone structure is reduced to a description of thespatial arrangement of its secondary structural elements. The relativepositions of the secondary structural elements are defined by a set ofparameters called supersecondary structure parameters. These parametersare assigned values that can be systematically or randomly varied toalter the arrangement of the secondary structure elements to introduceexplicit backbone flexibility. The atomic coordinates of the backboneare then changed to reflect the altered supersecondary structuralparameters, and these new coordinates are input into the system for usein the subsequent protein design automation. For details, see U.S. Pat.No. 6,269,312, the entire content incorporated herein by reference.

“Conformational energy” refers generally to the energy associated with aparticular “conformation”, or three-dimensional structure, of amacromolecule, such as the energy associated with the conformation of aparticular protein. Interactions that tend to stabilize a protein haveenergies that are represented as negative energy values, whereasinteractions that destabilize a protein have positive energy values.Thus, the conformational energy for any stable protein is quantitativelyrepresented by a negative conformational energy value. Generally, theconformational energy for a particular protein will be related to thatprotein's stability. In particular, molecules that have a lower (i.e.,more negative) conformational energy are typically more stable, e.g., athigher temperatures (i.e., they have greater “thermal stability”).Accordingly, the conformational energy of a protein may also be referredto as the “stabilization energy.”

Typically, the conformational energy is calculated using an energy“force-field” that calculates or estimates the energy contribution fromvarious interactions which depend upon the conformation of a molecule.The force-field is comprised of terms that include the conformationalenergy of the alpha-carbon backbone, side chain—backbone interactions,and side chain—side chain interactions. Typically, interactions with thebackbone or side chain include terms for bond rotation, bond torsion,and bond length. The backbone-side chain and side chain-side chaininteractions include van der Waals interactions, hydrogen-bonding,electrostatics and solvation terms. Electrostatic interactions mayinclude coulombic interactions, dipole interactions and quadrapoleinteractions). Other similar terms may also be included. Force-fieldsthat may be used to determine the conformational energy for a polymerare well known in the art and include the CHARMM (see, Brooks et al, J.Comp. Chem. 1983, 4:187-217; MacKerell et al., in The Encyclopedia ofComputational Chemistry, Vol. 1:271-277, John Wiley & Sons, Chichester,1998), AMBER (see, Cornell et al., J. Amer. Chem. Soc. 1995, 117:5179;Woods et al., J. Phys. Chem. 1995, 99:3832-3846; Weiner et al., J. Comp.Chem. 1986, 7:230; and Weiner et al., J. Amer. Chem. Soc. 1984, 106:765)and DREIDING (Mayo et al., J. Phys. Chem. 1990, 94-:8897) force-fields,to name but a few.

In a preferred implementation, the hydrogen bonding and electrostaticsterms are as described in Dahiyat & Mayo, Science 1997 278:82). Theforce field can also be described to include atomic conformational terms(bond angles, bond lengths, torsions), as in other references. See e.g.,Nielsen J E, Andersen K V, Honig B, Hooft R W W, Klebe G, Vriend G, &Wade R C, “Improving macromolecular electrostatics calculations,”Protein Engineering, 12: 657662 (1999); Stikoff D, Lockhart D J, Sharp KA & Honig B, “Calculation of electrostatic effects at the amino-terminusof an alpha-helix,” Biophys. J., 67: 2251-2260 (1994); Hendscb Z S,Tidor B, “Do salt bridges stabilize proteins—a continuum electrostaticanalysis,” Protein Science, 3: 211-226 (1994); Schneider J P, Lear J D,DeGrado W F, “A designed buried salt bridge in a heterodimeric coil,” J.Am. Chem. Soc., 119: 5742-5743 (1997); Sidelar C V, Hendsch Z S, TidorB, “Effects of salt bridges on protein structure and design,” ProteinScience, 7: 1898-1914 (1998). Solvation terms could also be included.See e.g., Jackson S E, Moracci M, elMastry N, Johnson C M, Fersht A R,“Effect of Cavity-Creating Mutations in the Hydrophobic Core ofChymotrypsin Inhibitor 2,” Biochemistry, 32: 11259-11269 (1993);Eisenberg, D & McLachlan A D, “Solvation Energy in Protein Folding andBinding,” Nature, 319: 199-203 (1986); Street A G & Mayo S L, “PairwiseCalculation of Protein Solvent-Accessible Surface Areas,” Folding &Design, 3: 253-258 (1998); Eisenberg D & Wesson L, “Atomic solvationparameters applied to molecular dynamics of proteins in solution,”Protein Science, 1: 227-235 (1992); Gordon & Mayo, supra.

“Coupled residues” are residues in a molecule that interact, through anymechanism. The interaction between the two residues is thereforereferred to as a “coupling interaction.” Coupled residues generallycontribute to polymer fitness through the coupling interaction.Typically, the coupling interaction is a physical or chemicalinteraction, such as an electrostatic interaction, a van der Waalsinteraction, a hydrogen bonding interaction, or a combination thereof.As a result of the coupling interaction, changing the identity of eitherresidue will affect the “fitness” of the molecule, particularly if thechange disrupts the coupling interaction between the two residues.Coupling interaction may also be described by a distance parameterbetween residues in a molecule. If the residues are within a certaincutoff distance, they are considered interacting.

“Fitness” is used to denote the level or degree to which a particularproperty or a particular combination of properties for a molecule, e.g.,a protein, are optimized. In certain embodiments of the invention, thefitness of a protein is preferably determined by properties which a userwishes to improve. Thus, for example, the fitness of a protein may referto the protein's thermal stability, catalytic activity, bindingaffinity, solubility (e.g., in aqueous or organic solvent), and thelike. Other examples of fitness properties include enantioselectivity,activity towards unnatural substrates, and alternative catalyticmechanisms. Coupling interactions can be modeled as a way of evaluatingor predicting fitness (stability). Fitness can be determined orevaluated experimentally or theoretically, e.g. computationally.

Preferably, the fitness is quantitated so that each molecule, e.g., eachamino acid will have a particular “fitness value”. For example, thefitness of a protein may be the rate at which the protein catalyzes aparticular chemical reaction, or the protein's binding affinity for aligand. In a particularly preferred embodiment, the fitness of a proteinrefers to the conformational energy of the polymer and is calculated,e.g., using any method known in the art. See, e.g. Brooks B. R.,Bruccoleri R E, Olafson, B D, States D J, Swaminathan S & Karplus M,“CHARMM: A Program for Macromolecular Energy, Minimization, and DynamicsCalculations,” J. Comp. Chem., 4: 187-217 (1983); Mayo S L, Olafson B D& Goddard W A G, “DREIDING: A Generic Force Field for MolecularSimulations,” J. Phys. Chem., 94: 8897-8909 (1990); Pabo C O & SuchanekE G, “Computer-Aided Model-Building Strategies for Protein Design,”Biochemistry, 25: 5987-5991 (1986), Lazar G A, Desjarlais J R & Handel TM, “De Novo Design of the Hydrophobic Core of Ubiquitin,” ProteinScience, 6: 1167-1178 (1997); Lee C & Levitt M, “Accurate Prediction ofthe Stability and Activity Effects of Site Directed Mutagenesis on aProtein Core,” Nature, 352: 448-451 (1991); Colombo G & Merz K M,“Stability and Activity of Mesophilic Subtilisin E and Its ThermophilicHomolog: Insights from Molecular Dynamics Simulations,” J. Am. Chem.Soc., 121: 6895-6903 (1999); Weiner S J, Kollman P A, Case D A, Singh UC, Ghio C, Alagona G, Profeta S J, Weiner P, “A new force field formolecular mechanical simulation of nucleic acids and proteins,” J. Am.Chem. Soc., 106: 765-784 (1984). Generally, the fitness of a protein isquantitated so that the fitness value increases as the property orcombination of properties is optimized. For example, in embodimentswhere the thermal stability of a protein is to be optimized(conformational energy is preferably decreased), the fitness value maybe the negative conformationl energy; i.e., F=−E.

The “fitness contribution” of a protein residue refers to the level orextent f(i_(a)) to which the residue i_(a), having an identity a,contributes to the total fitness of the protein. Thus, for example, ifchanging or mutating a particular amino acid residue will greatlydecrease the protein's fitness, that residue is said to have a highfitness contribution to the polymer. By contrast, typically someresidues i_(a) in a protein may have a variety of possible identities awithout affecting the protein's fitness. Such residues, therefore have alow contribution to the protein fitness.

“Dead-end elimination” (DEE) is a deterministic search algorithm thatseeks to systematically eliminate bad rotamers and combinations ofrotamers until a single solution remains. For example, amino acidresidues can be modeled as rotamers that interact with a fixed backbone.The theoretical basis for DEE provides that, if the DEE searchconverges, the solution is the global minimum energy conformation (GMEC)with no uncertainty (Desmet et al., 1992).

Dead end elimination is based on the following concept. Consider tworotamers, i_(r) and i_(t), at residue i, and the set of all otherrotamer configurations {S} at all residues excluding i (of which rotamerj_(s) is a member). If the pairwise energy contributed between i_(r) andj_(s) is higher than the pairwise energy between i_(t) and j_(s) for all{S}, then rotamer i_(r) cannot exist in the global minimum energyconformation, and can be eliminated. This notion is expressedmathematically by the inequality.

$\begin{matrix}{{{E\left( i_{r} \right)} + {\sum\limits_{j \neq i}^{N}{E\left( {i_{r},j_{s}} \right)}}} > {{E\left( i_{t} \right)} + {\sum\limits_{j \neq i}^{N}{{E\left( {i_{t},j_{s}} \right)}\left\{ S \right\}}}}} & \left( {{Equation}\mspace{14mu} A} \right)\end{matrix}$

If this expression is true, the single rotamer i_(r) can be eliminated(Desmet et al., 1992).

In this form, Equation A is not computationally tractable because, tomake an elimination, it is required that the entire sequence (rotamer)space be enumerated. To simplify the problem, bounds implied by EquationA can be utilized:

$\begin{matrix}{{{E\left( i_{r} \right)} + {\sum\limits_{j \neq i}^{N}{{\min (s)}{E\left( {i_{r},j_{s}} \right)}}}} > {{E\left( i_{t} \right)} + {\sum\limits_{j \neq i}^{N}{{\max (s)}{E\left( {i_{t},j_{s}} \right)}\left\{ S \right\}}}}} & \left( {{Equation}\mspace{14mu} B} \right)\end{matrix}$

Using an analogous argument, Equation B can be extended to theelimination of pairs of rotamers inconsistent with the GMEC. This isdone by determining that a pair of rotamers i_(r) at residue i and j_(s)at residue j, always contribute higher energies than rotamers i_(u) andj_(v) with all possible rotamer combinations {L}. Similar to Equation B,the strict bound of this statement is given by:

$\begin{matrix}{{{ɛ\left( {i_{r},j_{s}} \right)} + {\sum\limits_{{k \neq i},j}^{N}{{\min (t)}{ɛ\left( {i_{r},j_{s},k_{t}} \right)}}}} > {{ɛ\left( {i_{u},j_{v}} \right)} + {\sum\limits_{{k \neq i},j}^{N}{{\max (t)}{ɛ\left( {i_{u},j_{v},k_{i}} \right)}}}}} & \left( {{Equation}\mspace{14mu} C} \right)\end{matrix}$

where ε is the combined energies for rotamer pairs

ε(i _(r) ,j _(s))=E(i _(r))+E(j _(s))+(i _(r) ,j _(s)  (Equation D),

and

ε(i _(r) ,j _(s) ,k _(t))=E(i _(r) ,k _(t))+(j _(s) ,k _(t)  (EquationE).

This leads to the doubles elimination of the pair of rotamers i_(r) andi_(s), but does not eliminate the individual rotamers completely aseither could exist independently in the GMEC. The doubles eliminationstep reduces the number of possible pairs (reduces S) that need to beevaluated in the right-hand side of Equation 6, allowing more rotamersto be individually eliminated.

The singles and doubles criteria presented by Desmet et al. fail todiscover special conditions that lead to the determination of moredead-ending rotamers For instance, it is possible that the energycontribution of rotamer i_(t) is always lower than i_(r) without themaximum of i_(t) being below the minimum of i_(r). To address thisproblem, Goldstein 1994 presented a modification of the criteria thatdetermines if the energy profiles of two rotamers cross. If they do not,the higher energy rotamer can be determined to be dead-ending. Thedoubles calculation significantly more computational time than thesingles calculation. To accelerate the process, other computationalmethods have been developed to predict the doubles calculations thatwill be the most productive (Gordon & Mayo, 1998). These kinds ofmodifications, collectively referred to as fast doubles, significantlyimproved the speed and effectiveness of DEE.

Several other modifications also enhance DEE. Rotamers from multipleresidues can be combined into so-called super-rotamers to prompt furthereliminations (Desmet et al., 1994; Goldstein, 1994). This has theadvantage of eliminating multiple rotamers in a single step. Inaddition, it has been shown that “splitting” the conformational spacebetween rotamers improves the efficiency of DEE (Pierce et al., 2000).Splitting handles the following special case. Consider rotamer i_(r). Ifa rotamer i_(t1) contributes a lower energy than i_(r) for a portion ofthe conformational space, and a rotamer i_(t2) has a lower energy thani_(r) for the remaining fraction, then i_(r) can be eliminated. Thiscase would not be detected by the less sensitive Desmet or Goldsteincriteria. In the preferred implementations of the invention as describedherein, all of the described enhancements to DEE were used.

For further discussion of these methods see, Goldstein, R. F. (1994),Efficient rotamer elimination applied to protein side-chains and relatedspin glasses, Biophysical Journal 66, 1335-1340; Desmet, J., De Maeyer,M., Hazes, B. & Lasters, I. (1992), The dead-end elimination theorem andits use in protein side-chain positioning. Nature 356, 539-542; Desmet,J., De Maeyer, M. & Lasters, I. (1994), In The Protein Folding Problemand Tertiary Structure Prediction (Jr., K. M. & Grand, S. L., eds.), pp.307-337 (Birkhauser, Boston); De Maeyer, M., Desmet, J. & Lasters, I.(1997), All in one: a highly detailed rotamer library improves bothaccuracy and speed in the modeling of side chains by dead-endelimination, Folding & Design 2, 53-66, Gordon, D. B. & Mayo, S. L.(1998), Radical performance enhancements for combinatorial optimizationalgorithms based on the dead-end elimination theorem, Journal ofComputational Chemistry 19, 1505-1514; Pierce, N. A., Spriet, J. A.,Desmet, J., Mayo, S. L., (2000), Conformational splitting: A morepowerful criterion for dead-end elimination; Journal of ComputationalChemistry 21, 999-1009.

“Expression system” means a host cell and compatible vector undersuitable conditions, e.g. for the expression of a protein coded for byforeign DNA carried by the vector and introduced to the host cell.Common expression systems include E. coli host cells and plasmidvectors, insect host cells such as Sf9, Hi5 or S2 cells and Baculovirusvectors, Drosophila cells (Schneider cells) and expression systems, andmammalian host cells and vectors.

“Host cell” means any cell of any organism that is selected, modified,transformed, grown or used or manipulated in any way for the productionof a substance by the cell. For example, a host cell may be one that ismanipulated to express a particular gene, a DNA or RNA sequence, aprotein or an enzyme. Host cells may be cultured in vitro or one or morecells in a non-human animal (e.g., a transgenic animal or a transientlytransfected animal).

The methods of the invention may include steps of comparing sequences toeach other, including wild-type sequence to one or more mutants. Suchcomparisons typically comprise alignments of polymer sequences, e.g.,using sequence alignment programs and/or algorithms that are well knownin the art (for example, BLAST, FASTA and MEGALIGN, to name a few). Theskilled artisan can readily appreciate that, in such alignments, where amutation contains a residue insertion or deletion, the sequencealignment will introduce a “gap” (typically represented by a dash, “-”,or “Δ”) in the polymer sequence not containing the inserted or deletedresidue.

“Homologous”, in all its grammatical forms and spelling variations,refers to the relationship between two molecules (e.g. proteins, tRNAs,nucleic acids) that possess a “common evolutionary origin”, includingproteins from superfamilies in the same species of organism, as well ashomologous proteins from different species of organism. Such proteins(and their encoding nucleic acids) have sequence and/or structuralhomology, as reflected by their sequence similarity, whether in terms ofpercent identity or by the presence of specific residues or motifs andconserved positions. Homologous molecules frequently also share similaror even identical functions.

The term “sequence similarity”, in all its grammatical forms, refers tothe degree of identity or correspondence between nucleic acid or aminoacid sequences that may or may not share a common evolutionary origin(see, Reeck et al., supra). However, in common usage and in the instantapplication, the term “homologous”, when modified with an adverb such as“highly”, may refer to sequence similarity and may or may not relate toa common evolutionary origin.

A nucleic acid molecule is “hybridizable” to another nucleic acidmolecule, such as a cDNA, genomic DNA, or RNA, when a single strandedform of the nucleic acid molecule can anneal to the other nucleic acidmolecule under the appropriate conditions of temperature and solutionionic strength (see Sambrook et al., Molecular Cloning: A LaboratoryManual, Second Edition (1989) Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y.). The conditions of temperature and ionic strengthdetermine the “stringency” of the hybridization. For preliminaryscreening for homologous nucleic acids, low stringency hybridizationconditions, corresponding to a T_(m) (melting temperature) of 55° C.,can be used, e.g., 5×SSC, 0.1% SDS, 0.25% milk, and no formamide; or 30%formamide, 5×SSC, 0.5% SDS). Moderate stringency hybridizationconditions correspond to a higher T_(m), e.g., 40% formamide, with 5× or6×SSC. High stringency hybridization conditions correspond to thehighest T_(m), e.g., 50% formamide, 5× or 6×SSC. SSC is a 0.15M NaCl,0.015M Na-citrate. Hybridization requires that the two nucleic acidscontain complementary sequences, although depending on the stringency ofthe hybridization, mismatches between bases are possible. Theappropriate stringency for hybridizing nucleic acids depends on thelength of the nucleic acids and the degree of complementation, variableswell known in the art. The greater the degree of similarity or homologybetween two nucleotide sequences, the greater the value of T_(m) forhybrids of nucleic acids having those sequences. The relative stability(corresponding to higher T_(m)) of nucleic acid hybridizations decreasesin the following order: RNA:RNA, DNA:RNA, DNA:DNA. For hybrids ofgreater than 100 nucleotides in length, equations for calculating T_(m)have been derived (see Sambrook et al., supra, 9.50-9.51). Forhybridization with shorter nucleic acids, i.e., oligonucleotides, theposition of mismatches becomes more important, and the length of theoligonucleotide determines its specificity (see Sambrook et al., supra,11.7-11.8). A minimum length for a hybridizable nucleic acid is at leastabout 10 nucleotides; preferably at least about 15 nucleotides; and morepreferably the length is at least about 20 nucleotides.

Unless specified, the term “standard hybridization conditions” refers toa T_(m) of about 55° C., and utilizes conditions as set forth above. Ina preferred embodiment, the T_(m) is 60° C.; in a more preferredembodiment, the T_(m) is 65° C. In a specific embodiment, “highstringency” refers to hybridization and/or washing conditions at 68° C.in 0.2×SSC, at 42° C. in 50% formamide, 4×SSC, or under conditions thatafford levels of hybridization equivalent to those observed under eitherof these two conditions.

Suitable hybridization conditions for oligonucleotides (e.g., foroligonucleotide probes or primers) are typically somewhat different thanfor full-length nucleic acids (e.g., full-length cDNA), because of theoligonucleotides' lower melting temperature. Because the meltingtemperature of oligonucleotides will depend on the length of theoligonucleotide sequences involved, suitable hybridization temperatureswill vary depending upon the oligoncucleotide molecules used. Exemplarytemperatures may be 37° C. (for 14-base oligonucleotides), 48° C. (for17-base oligoncucleotides), 55° C. (for 20-base oligonucleotides) and60° C. (for 23-base oligonucleotides). Exemplary suitable hybridizationconditions for oligonucleotides include washing in 6×SSC/0.05% sodiumpyrophosphate, or other conditions that afford equivalent levels ofhybridization.

“Polypeptide,” “peptide” or “protein” are used interchangably todescribe a chain of amino acids that are linked together by chemicalbonds called “peptide bonds.” A protein or polypeptide, including anenzyme, may be a “native” or “wild-type”, meaning that it occurs innature; or it may be a “mutant”, “variant” or “modified”, meaning thatit has been made, altered, derived, or is in some way different orchanged from a native protein or from another mutant.

“Rotamer” is defined as a set of possible conformers for each amino acidor analog side chain. See Ponder, et al., Acad. Press Inc. (London) Ltd.pp. 775-791 (1987); Dunbrack, et al., Struc. Biol. 1(5):334-340 (1994);Desmet, et al., Nature 356:539-542 (1992). A “rotamer library” is acollection of a set of possible/allowable rotametic conformations for agiven set of amino acids or analogs. There are two general types ofrotamer libraries: “backbone dependent” and “backbone independent.” Abackbone dependent rotamer library allows different rotamers dependingon the position of the residue in the backbone; thus for example,certain leucine rotamers are allowed if the position is within an αhelix, and different leucine rotamers are allowed if the position is notin an α-helix. A backbone independent rotamer library utilizes allrotamers of an amino acid at every position. In general, a backboneindependent library is preferred in the consideration of core residues,since flexibility in the core is important. However, backboneindependent libraries are computationally more expensive, and thus forsurface and boundary positions, a backbone dependent library ispreferred. However, either type of library can be used at any position.

“Variable residue position” herein is meant an amino acid position ofthe protein to be designed that is not fixed in the design method as aspecific residue or rotamer, generally the wild-type residue or rotamer.It should be noted that even if a position is chosen as a variableposition, it is possible that the methods of the invention will optimizethe sequence in such a way as to select the wild type residue at thevariable position. This generally occurs more frequently for coreresidues, and less regularly for surface residues. In addition, it ispossible to fix residues as non-wild type amino acids as well.

“Fixed residue position” means that the residue identified in the threedimensional structure as being in a set conformation. In someembodiments, a fixed position is left in its original conformation(which may or may not correlate to a specific rotamer of the rotamerlibrary being used). Alternatively, residues may be fixed as a non-wildtype residue depending on design needs; for example, when knownsite-directed mutagenesis techniques have shown that a particularresidue is desirable (for example, to eliminate a proteolytic site oralter the substrate specificity of an AARS), the residue may be fixed asa particular amino acid. Residues which can be fixed include, but arenot limited to, structurally or biologically functional residues. Forexample, the anchor residues.

In certain embodiments, a fixed position may be “floated”; the aminoacid or analog at that position is fixed, but different rotamers of thatamino acid or analog are tested. In this embodiment, the variableresidues may be at least one, or anywhere from 0.1% to 99.9% of thetotal number of residues. Thus, for example, it may be possible tochange only a few (or one) residues, or most of the residues, with allpossibilities in between.

As used herein, the term “orthogonal” refers to a molecule (e.g., anorthogonal tRNA (O-tRNA) and/or an orthogonal aminoacyl tRNA synthetase(O-RS)) that is used with reduced efficiency (as compared to wild-typeor endogenous) by a system of interest (e.g., a translational system,e.g., a cell). Orthogonal refers to the inability or reduced efficiency,e.g., less than 20% efficient, less than 10% efficient, less than 5%efficient, or e.g., less than 1% efficient, of an orthogonal tRNA and/ororthogonal RS to function in the translation system of interest. Forexample, an orthogonal tRNA in a translation system of interestaminoacylates any endogenous RS of a translation system of interest withreduced or even zero efficiency, when compared to aminoacylation of anendogenous tRNA by the endogenous RS. In another example, an orthogonalRS aminoacylates any endogenous tRNA in the translation system ofinterest with reduced or even zero efficiency, as compared toaminoacylation of the endogenous tRNA by an endogenous RS. “Improvementin orthogonality” refers to enhanced orthogonality compared to astarting material or a naturally occurring tRNA or RS.

“Wobble degenerate codon” refers to a codon encoding a natural aminoacid, which codon, when present in mRNA, is recognized by a natural tRNAanticodon through at least one non-Watson-Crick, or wobble base-pairing(e.g., A-C or G-U base-pairing). Watson-Crick base-pairing refers toeither the G-C or A-U (RNA or DNA/RNA hybrid) or A-T (DNA) base-pairing.When used in the context of mRNA codon-tRNA anticodon base-pairing,Watson-Crick base-pairing means all codon-anticodon base-pairings aremediated through either G-C or A-U.

As used herein, proteins and/or protein sequences are “homologous” whenthey are derived, naturally or artificially, from a common ancestralprotein or protein sequence. Similarly, nucleic acids and/or nucleicacid sequences are homologous when they are derived, naturally orartificially, from a common ancestral nucleic acid or nucleic acidsequence. For example, any naturally occurring nucleic acid can bemodified by any available mutagenesis method to include one or moreselector codon. When expressed, this mutagenized nucleic acid encodes apolypeptide comprising one or more unnatural amino acid. The mutationprocess can, of course, additionally alter one or more standard codon,thereby changing one or more standard amino acid in the resulting mutantprotein as well. Homology is generally inferred from sequence similaritybetween two or more nucleic acids or proteins (or sequences thereof).The precise percentage of similarity between sequences that is useful inestablishing homology varies with the nucleic acid and protein at issue,but as little as 25% sequence similarity is routinely used to establishhomology. Higher levels of sequence similarity, e.g., 30%, 40%, 50%,60%, 70%, 80%, 90%, 95% or 99% or more can also be used to establishhomology. Methods for determining sequence similarity percentages (e.g.,BLASTP and BLASTN using default parameters) are described herein and aregenerally available.

The term “preferentially aminoacylates” refers to an efficiency, e.g.,about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about75%, about 85%, about 90%, about 95%, about 99% or more efficient, atwhich an O-RS aminoacylates an 0-tRNA with an unnatural amino acidcompared to a naturally occurring tRNA or starting material used togenerate the O-tRNA. The unnatural amino acid is then incorporated intoa growing polypeptide chain with high fidelity, e.g., at greater thanabout 20%, 30%, 40%, 50%, 60%, 75%, 80%, 90%, 95%, or greater than about99% efficiency for a given codon.

The term “complementary” refers to components of an orthogonal pair,O-tRNA and O-RS that can function together, e.g., the O-RS aminoacylatesthe O-tRNA.

The term “derived from” refers to a component that is isolated from anorganism or isolated and modified, or generated, e.g., chemicallysynthesized, using information of the component from the organism.

The term “translation system” refers to the components necessary toincorporate a naturally occurring or unnatural amino acid into a growingpolypeptide chain (protein). For example, components can includeribosomes, tRNA(s), synthetas(es), mRNA and the like. The components ofthe present invention can be added to a translation system, in vivo orin vitro. An in vivo translation system may be a cell (eukaryotic orprokaryotic cell). An in vitro translation system may be a cell-freesystem, such as reconstituted one with components from differentorganisms (purified or recombinantly produced).

The term “inactive RS” refers to a synthetase that have been mutated sothat it no longer can aminoacylate its cognate tRNA with an amino acid.

The term “selection agent” refers to an agent that when present allowsfor a selection of certain components from a population, e.g., anantibiotic, wavelength of light, an antibody, a nutrient or the like.The selection agent can be varied, e.g., such as concentration,intensity, etc.

The term “positive selection marker” refers to a marker than whenpresent, e.g., expressed, activated or the like, results inidentification of an organism with the positive selection marker fromthose without the positive selection marker.

The term “negative selection marker” refers to a marker than whenpresent, e.g., expressed, activated or the like, allows identificationof an organism that does not possess the desired property (e.g., ascompared to an organism which does possess the desired property).

The term “reporter” refers to a component that can be used to selectcomponents described in the present invention. For example, a reportercan include a green fluorescent protein, a firefly luciferase protein,or genes such as β-gal/lacZ (β-galactosidase), Adh (alcoholdehydrogenase) or the like.

The term “not efficiently recognized” refers to an efficiency, e.g.,less than about 10%, less than about 5%, or less than about 1%, at whicha RS from one organism aminoacylates O-tRNA.

The term “eukaryote” refers to organisms belonging to the phylogeneticdomain Eucarya such as animals (e.g., mammals, insects, reptiles, birds,etc.), ciliates, plants, fungi (e.g., yeasts, etc.), flagellates,microsporidia, protists, etc. Additionally, the term “prokaryote” refersto non-eukaryotic organisms belonging to the Eubacteria (e.g.,Escherichia coli, Thermus thermophilus, etc.) and Archaea (e.g.,Methanococcus jannaschii, Methanobacterium thermoautotrophicum,Halobacterium such as Haloferax volcanii and Halobacterium speciesNRC-1, A. fulgidus, P. firiosus, P. horikoshii, A. pernix, etc.)phylogenetic domains.

III. The Genetic Code, Host Cells, and the Degenerate Codons

The standard genetic code most cells use is listed below.

The Genetic Code Middle First U C A G Last U Phe Ser Tyr Cys U Phe SerTyr Cys C Leu Ser Stop Stop A (Ochre) (Umber) Leu Ser Stop Trp G (Amber)C Leu Pro His Arg U Leu Pro His Arg C Leu Pro Gln Arg A Leu Pro Gln ArgG A Ile Thr Asn Ser U Ile Thr Asn Ser C Ile Thr Lys Arg A Met Thr LysArg G G Val Ala Asp Gly U Val Ala Asp Gly C Val Ala Glu Gly A Val AlaGlu Gly G

The genetic code is degenerate, in that the protein biosyntheticmachinery utilizes 61 mRNA sense codons to direct the templatedpolymerization of the 20 natural amino acid monomers. (Crick et al.,Nature 192: 1227, 1961). Just two amino acids, i.e., methionine andtryptophan, are encoded by unique mRNA triplets.

The standard genetic code applies to most, but not all, cases.Exceptions have been found in the mitochondrial DNA of many organismsand in the nuclear DNA of a few lower organisms. Some examples are givenin the following table.

Examples of non-standard genetic codes.

Mitochondria Vertibrates UGA → Trp; AGA, AGG → STOP Invertibrates UGA →Trp; AGA, AGG → Ser Yeasts UGA → Trp; CUN → Thr Protista UGA → Trp;Nucleus Bacteria GUG, UUG, AUU, CUG → initiation Yeasts CUG → SerCiliates UAA, UAG → Gln *Plant cells use the standard genetic code inboth mitochondria and the nucleus.

The NCBI (National Center for Biotechnology Information) maintains adetailed list of the standard genetic code, and genetic codes used invarious organisms, including the vertebrate mitochondrial code; theyeast mitochondrial code; the mold, protozoan, and coelenteratemitochondrial code and the mycoplasma/spiroplasma code; the invertebratemitochondrial code; the ciliate, dasycladacean and hexamita nuclearcode; the echinoderm and flatworm mitochondrial code; the euplotidnuclear code; the bacterial and plant plastid code; the alternativeyeast nuclear code; the ascidian mitochondrial code; the alternativeflatworm mitochondrial code; blepharisma nuclear code; chlorophyceanmitochondrial code; trematode mitochondrial code; scenedesmus obliquusmitochondrial code; thraustochytrium mitochondrial code (allincorporated herein by reference). These are primarily based on thereviews by Osawa et al., Microbiol. Rev. 56: 229-264, 1992, and Jukesand Osawa, Comp. Biochem. Physiol. 106B: 489-494, 1993.

Host Cells

The methods of the invention can be practiced within a cell, whichenables production levels of proteins to be made for practical purposes.Because of the high degree of conservation of the genetic code and thesurrounding molecular machinery, method of the invention can be used inmost cells.

In preferred embodiments, the cells used are culturable cells (i.e.,cells that can be grown under laboratory conditions). Suitable cellsinclude mammalian cells (human or non-human mammals), bacterial cells,and insect cells, etc.

Degenerate Codon Selection

As described above, all amino acids, with the exception of methionineand tryptophan are encoded by more than one codon. According to themethods of the invention, a codon that is normally used to encode anatural amino acid is reprogrammed to encode an amino acid analog. Anamino acid analog can be a naturally occurring or canonical amino acidanalog. In a preferred embodiment, the amino acid analog is not acanonically encoded amino acid.

The thermodynamic stability of a codon-anticodon pair can be predictedor determined experimentally. According to the invention, it ispreferable that the orthogonal tRNA interacts with the degenerate codonwith an affinity (at 37° C.) of at least about 1.0 kcal/mol morestrongly, even more preferably 1.5 kcal/mole more strongly, and evenmore preferably more than 2.0 kcal/mol more strongly than a natural tRNAin the cell would recognize the same sequence. These values are known toone of skill in the art and can be determined by thermal denaturationexperiments (see, e.g., Meroueh and Chow, Nucleic Acids Res. 27: 1118,1999).

The following table lists some of the known anti-codon sequences for E.coli. In general, for any organism, tRNA anticodon sequence can beroutinely determined using art-recognized technologies. For example, anytRNA gene can be amplified by, for example, PCR. Sequencing can beperformed to determine the exact sequences of the anti-codon loop.Alternatively, biochemical binding assay may be used to determine thebinding affinity of a purified tRNA to one of the 2-6 possible codons.The codon that binds the tRNA with the highest specificity/affinitypresumably has pure Watson-Crick match at all three codon positions,thus determining the sequence of the anti-codon loop.

In general, the wobble base in the anti-codon loop tends to be G or U(rather than A or C).

The Degenerate Codons for E. coli Base- Base- Amino Anti- paring AminoAnti- paring Acid codon at 3^(rd) base Codon Acid codon at 3^(rd) baseCodon Ala GGC W/C¹ GCC His GUG W/C CAC Wobble² GCU Wobble CAU UGC W/CGCA Ile GAU W/C AUC Wobble GCG Wobble AUU Asp GUC W/C GAC Leu GAG W/CCUC Wobble GAU Wobble CUU Asn GUU W/C AAC Lys UUU W/C AAA Wobble AAUWobble AAG Cys GCA W/C UGC Phe GAA W/C UUC Wobble UGU Wobble UUU Glu UUCW/C GGA Ser GGA W/C UUC Wobble GAG Wobble UCU Gly GCC W/C GGC Tyr GUAW/C UAC Wobble GGU Wobble UAU ¹Watson-Crick base pairing, ²Wobble basepairing

When the cell has a single tRNA that recognizes a codon through aperfect complementary interaction between the anticodon of the tRNA andone codon, and recognizes a second, degenerate codon through a wobble orother non-standard base pairing interaction, a new tRNA can beconstructed having an anticodon sequence that is perfectly complementaryto the degenerate codon.

When the cell has multiple tRNA molecules for a particular amino acid,and one tRNA has an anticodon sequence that is perfectly complementaryto the degenerate codon selected, the gene encoding the tRNA can bedisabled through any means available to one of skill in the artincluding, for example, site-directed mutagenesis or deletion of eitherthe gene or the promoter sequence of the gene. Expression of the genealso can be disable through any antisense or RNA interferencetechniques.

IV. Unnatural Amino Acids

The first step in the protein engineering process is usually to select aset of unnatural amino acids that have the desired chemical properties.The selection of unnatural amino acids depends on pre-determinedchemical properties one would like to have, and the modifications onewould like to make in the target protein. Unnatural amino acids, onceselected, can either be purchased from vendors, or chemicallysynthesized.

A wide variety of unnatural amino acids can be used in the methods ofthe invention. The unnatural amino acid can be chosen based on desiredcharacteristics of the unnatural amino acid, e.g., function of theunnatural amino acid, such as modifying protein biological propertiessuch as toxicity, biodistribution, or half life, structural properties,spectroscopic properties, chemical and/or photochemical properties,catalytic properties, ability to react with other molecules (eithercovalently or noncovalently), or the like.

As used herein an “unnatural amino acid” refers to any amino acid,modified amino acid, or amino acid analogue other than selenocysteineand the following twenty genetically encoded alpha-amino acids: alanine,arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid,glycine, histidine, isoleucine, leucine, lysine, methionine,phenylalanine, proline, serine, threonine, tryptophan, tyrosine, valine.The generic structure of an alpha-amino acid is illustrated by FormulaI:

An unnatural amino acid is typically any structure having Formula Iwherein the R group is any substituent other than one used in the twentynatural amino acids. See, e.g., any biochemistry text such asBiochemistry by L. Stryer, 3rd ed. 1988, Freeman and Company, New York,for structures of the twenty natural amino acids. Note that, theunnatural amino acids of the present invention may be naturallyoccurring compounds other than the twenty alpha-amino acids above.Because the unnatural amino acids of the invention typically differ fromthe natural amino acids in side chain only, the unnatural amino acidsform amide bonds with other amino acids, e.g., natural or unnatural, inthe same manner in which they are formed in naturally occurringproteins. However, the unnatural amino acids have side chain groups thatdistinguish them from the natural amino acids. For example, R in FormulaI optionally comprises an alkyl-, aryl-, acyl-, keto-, azido-,hydroxyl-, hydrazine, cyano-, halo-, hydrazide, alkenyl, alkynl, ether,thiol, seleno-, sulfonyl-, borate, boronate, phospho, phosphono,phosphine, heterocyclic, enone, imine, aldehyde, ester, thioacid,hydroxylamine, amino group, or the like or any combination thereof.Other unnatural amino acids of interest include, but are not limited to,amino acids comprising a photoactivatable cross-linker, spin-labeledamino acids, fluorescent amino acids, metal binding amino acids,metal-containing amino acids, radioactive amino acids, amino acids withnovel functional groups, amino acids that covalently or noncovalentlyinteract with other molecules, photocaged and/or photoisomerizable aminoacids, amino acids comprising biotin or a biotin analogue, glycosylatedamino acids such as a sugar substituted serine, other carbohydratemodified amino acids, keto containing amino acids, amino acidscomprising polyethylene glycol or polyether, heavy atom substitutedamino acids, chemically cleavable and/or photocleavable amino acids,amino acids with an elongated side chains as compared to natural aminoacids, e.g., polyethers or long chain hydrocarbons, e.g., greater thanabout 5 or greater than about 10 carbons, carbon-linked sugar-containingamino acids, redox-active amino acids, amino thioacid containing aminoacids, and amino acids comprising one or more toxic moiety.

In addition to unnatural amino acids that contain novel side chains,unnatural amino acids also optionally comprise modified backbonestructures, e.g., as illustrated by the structures of Formula II andIII:

wherein Z typically comprises OH, NH₂, SH, NH—R′, or S—R′; X and Y,which may be the same or different, typically comprise S or O, and R andR′, which are optionally the same or different, are typically selectedfrom the same list of constituents for the R group described above forthe unnatural amino acids having Formula I as well as hydrogen. Forexample, unnatural amino acids of the invention optionally comprisesubstitutions in the amino or carboxyl group as illustrated by FormulasII and III. Unnatural amino acids of this type include, but are notlimited to, α-hydroxy acids, α-thioacids α-aminothiocarboxylates, e.g.,with side chains corresponding to the common twenty natural amino acidsor unnatural side chains. In addition, substitutions at the α-carbonoptionally include L, D, or α-α-disubstituted amino acids such asD-glutamate, D-alanine, D-methyl-O-tyrosine, aminobutyric acid, and thelike. Other structural alternatives include cyclic amino acids, such asproline analogues as well as 3, 4, 6, 7, 8, and 9 membered ring prolineanalogues, β and γ amino acids such as substituted β-alanine and γ-aminobutyric acid.

For example, many unnatural amino acids are based on natural aminoacids, such as tyrosine, glutamine, phenylalanine, and the like.Tyrosine analogs include para-substituted tyrosines, ortho-substitutedtyrosines, and meta substituted tyrosines, wherein the substitutedtyrosine comprises an acetyl group, a benzoyl group, an amino group, ahydrazine, an hydroxyamine, a thiol group, a carboxy group, an isopropylgroup, a methyl group, a C6-C20 straight chain or branched hydrocarbon,a saturated or unsaturated hydrocarbon, an O-methyl group, a polyethergroup, a nitro group, or the like. In addition, multiply substitutedaryl rings are also contemplated. Glutamine analogs of the inventioninclude, but are not limited to, α-hydroxy derivatives, β-substitutedderivatives, cyclic derivatives, and amide substituted glutaminederivatives. Example phenylalanine analogs include, but are not limitedto, meta-substituted phenylalanines, wherein the substituent comprises ahydroxy group, a methoxy group, a methyl group, an allyl group, anacetyl group, or the like.

Specific examples of unnatural amino acids include, but are not limitedto, O-methyl-L-tyrosine, an L-3-(2-naphthyl)alanine, a3-methyl-phenylalanine, an O-4-allyl-L-tyrosine, a 4-propyl-L-tyrosine,a tri-O-acetyl-GlcNAcβ-serine, an L-Dopa, a fluorinated phenylalanine,an isopropyl-L-phenylalanine, a p-azido-L-phenylalanine, ap-acyl-L-phenylalanine, a p-benzoyl-L-phenylalanine, an L-phosphoserine,a phosphonoserine, a phosphonotyrosine, a p-iodo-phenylalanine, ap-bromophenylalanine, a p-amino-L-phenylalanine, and anisopropyl-L-phenylalanine, and the like. The structures of a variety ofnon-limiting unnatural amino acids are provided in the figures, e.g.,FIGS. 29, 30, and 31 of US 2003/0108885 A1 (entire content incorporatedherein by reference).

Typically, the unnatural amino acids of the invention are selected ordesigned to provide additional characteristics unavailable in the twentynatural amino acids. For example, unnatural amino acid are optionallydesigned or selected to modify the biological properties of a protein,e.g., into which they are incorporated. For example, the followingproperties are optionally modified by inclusion of an unnatural aminoacid into a protein: toxicity, biodistribution, solubility, stability,e.g., thermal, hydrolytic, oxidative, resistance to enzymaticdegradation, and the like, facility of purification and processing,structural properties, spectroscopic properties, chemical and/orphotochemical properties, catalytic activity, redox potential,half-life, ability to react with other molecules, e.g., covalently ornoncovalently, and the like.

Further details regarding unnatural amino acids are described in US2003-0082575 A1, entitled “In vivo Incorporation of Unnatural AminoAcids,” filed on Apr. 19, 2002, which is incorporated herein byreference.

Additionally, other examples optionally include (but are not limited to)an unnatural analogue of a tyrosine amino acid; an unnatural analogue ofa glutamine amino acid; an unnatural analogue of a phenylalanine aminoacid; an unnatural analogue of a serine amino acid; an unnaturalanalogue of a threonine amino acid; an alkyl, aryl, acyl, azido, cyano,halo, hydrazine, hydrazide, hydroxyl, alkenyl, alkynl, ether, thiol,sulfonyl, seleno, ester, thioacid, borate, boronate, phospho, phosphono,phosphine, heterocyclic, enone, imine, aldehyde, hydroxylamine, keto, oramino substituted amino acid, or any combination thereof; an amino acidwith a photoactivatable cross-linker; a spin-labeled amino acid; afluorescent amino acid; an amino acid with a novel functional group; anamino acid that covalently or noncovalently interacts with anothermolecule; a metal binding amino acid; a metal-containing amino acid; aradioactive amino acid; a photocaged amino acid; a photoisomerizableamino acid; a biotin or biotin-analogue containing amino acid; aglycosylated or carbohydrate modified amino acid; a keto containingamino acid; an amino acid comprising polyethylene glycol; an amino acidcomprising polyether; a heavy atom substituted amino acid; a chemicallycleavable or photocleavable amino acid; an amino acid with an elongatedside chain; an amino acid containing a toxic group; a sugar substitutedamino acid, e.g., a sugar substituted serine or the like; acarbon-linked sugar-containing amino acid; a redox-active amino acid; anα-hydroxy containing acid; an amino thio acid containing amino acid; anα,α disubstituted amino acid; a β-amino acid; and a cyclic amino acidother than proline.

V. Aminoacyl-tRNA Synthetases

The aminoacyl-tRNA synthetase (used interchangeably herein with AARS or“synthetase”) used in the methods of the invention can be a naturallyoccurring synthetase derived from a different organism, a mutatedsynthetase, or a designed synthetase.

The synthetase used can recognize the desired (unnatural) amino acidanalog selectively over related amino acids available to the cell. Forexample, when the amino acid analog to be used is structurally relatedto a naturally occurring amino acid in the cell, the synthetase shouldcharge the orthogonal tRNA molecule with the desired amino acid analogwith an efficiency at least substantially equivalent to that of, andmore preferably at least about twice, 3 times, 4 times, 5 times or morethan that of the naturally occurring amino acid. However, in cases inwhich a well-defined protein product is not necessary, the synthetasecan have relaxed specificity for charging amino acids. In such anembodiment, a mixture of orthogonal tRNAs could be produced, withvarious amino acids or analogs.

In certain embodiments, it is preferable that the synthetase haveactivity both for the amino acid analog and for the amino acid that isencoded by the degenerate codon of the orthologous tRNA molecule. In theabsence of the amino acid analog, this allows the cell to continue togrow, while upon addition of the amino acid analog to the cell, allows aswitch to allow incorporation of the amino acid analog. The synthetasealso should be relatively specific for the orthogonal tRNA molecule overother naturally occurring tRNA molecules within the cell. Choosing atRNA-synthetase pair from an unrelated organism will generally allow forsuch selectivity. The selectivity of the synthetase for the orthogonaltRNA can be tested experimentally by testing the ability of theorthogonal synthetase to charge the natural tRNAs of the host cell withcanonical amino acids. (Orthogonality could be confirmed by even naturalamino acids, because tRNA recognition domain in synthetase might bedifferent from that for amino acid analogs. Of course, amino acidanalogs should be charged only into orthogonal tRNA efficiently bysynthetase, after binding site of synthetase is appropriately modified).Such procedures are described, for example, in Doctor and Mudd, J. Biol.Chem. 238: 3677-3681, 1963; Wang et al., Science 292: 498-500, 2001).

The method involves introduction into the host cell of a heterologousaminoacyl-tRNA synthetase and its cognate tRNA. If cross-chargingbetween the heterologous pair and the translational apparatus of thehost is slow or absent, and if the analogue is charged only by theheterologous synthetase, insertion of the analog can be restricted (orat least biased) to sites characterized by the most productivebase-pairing between the heterologous tRNA and the messenger RNA ofinterest.

A synthetase can be obtained by a variety of techniques known to one ofskill in the art, including combinations of such techniques as, forexample, computational methods, selection methods, and incorporation ofsynthetases from other organisms (see below).

In certain embodiments, synthetases can be used or developed thatefficiently charge tRNA molecules that are not charged by synthetases ofthe host cell. For example, suitable pairs may be generally developedthrough modification of synthetases from organisms distinct from thehost cell.

In certain embodiments, the synthetase can be developed by selectionprocedures.

In certain embodiments, the synthetase can be designed usingcomputational techniques such as those described in Datta et al., J. Am.Chem. Soc. 124: 5652-5653, 2002, and in copending U.S. patentapplication Ser. No. 10/375,298 (or US patent application publicationUS20040053390A1, see below).

1. Computational Design of AARS

Specifically, in one embodiment, the subject method partly depends onthe design and engineering of natural AARS to a modified form that hasrelaxed substrate specificity, such that it can uptake non-canonicalamino acid analogs as a substrate, and charge a modified tRNA (with itsanticodon changed) with such a non-canonical amino acid. The followingsections briefly describe a method for the generation of such modifiedAARS, which method is described in more detail in US patent applicationpublication US20040053390A1, the entire contents of which areincorporated herein by reference.

Briefly, the methods described therein relate to computational tools formodifying the substrate specificity of an AminoAcyl tRNA Synthetases(AARSs) through mutation to enable the enzyme to more efficientlyutilize amino acid analog(s) in protein translation systems, either invitro or in whole cells. A salient feature to the described invention ismethods and tools for systematically redesigning the substrate bindingsite of an AARS enzyme to facilitate the use of unnatural substrates inthe peptide or protein translation reaction the enzyme catalyzes.

According to the method, a rotamer library for the artificial amino acidis built by varying its torsional angles to create rotamers that wouldfit in the binding pocket for the natural substrate. The geometricorientation of the backbone of the amino acid analog is specified by thecrystallographic orientation of the backbone of the natural substrate inthe crystal structure. Amino acids in the binding pocket of thesynthetase that interact with the side chain on the analog are allowedto vary in identity and rotameric conformation in the subsequent proteindesign calculations.

The protocol also employ a computational method to enhance theinteractions between the substrate and the protein positions. This isdone by scaling up the pair-wise energies between the substrate and theamino acids allowed at the design positions on the protein in the energycalculations. In an optimization calculation where the protein-substrateinteractions are scaled up compared to the intra-protein interactions,sequence selection is biased toward selecting amino acids to be thosethat have favorable interaction with the substrate.

The described method helped to construct a new modified form of the E.coli phenylalanyl-tRNA synthetase, based on the known structure of therelated Thermus thermophilus PheRS (tPheRS). The new modified form ofthe E. coli phenylalanyl-tRNA synthetase (ePheRS**) allows efficient invivo incorporation of reactive aryl ketone functionality intorecombinant proteins. The results described therein also demonstrate thegeneral power of computational protein design in the development ofaminoacyl-tRNA synthetases for activation and charging of unnaturalamino acids.

A. Available Sequence and Structural Information for tRNA Synthetases

Protein translation from an mRNA template is carried out by ribosomes.During the translation process, each tRNA is matched with its amino acidlong before it reaches the ribosome. The match is made by a collectionof enzymes known as the aminoacyl-tRNA synthetases (AARS). These enzymescharge each tRNA with the proper amino acid, thus allowing each tRNA tomake the proper translation from the genetic code of DNA (and the mRNAtranscribed from the DNA) into the amino acid code of proteins.

Most cells make twenty different aminoacyl-tRNA synthetases, one foreach type of amino acid. These twenty enzymes are each optimized forfunction with its own particular amino acid and the set of tRNAmolecules appropriate to that amino acid. Aminoacyl-tRNA synthetasesmust perform their tasks with high accuracy. Many of these enzymesrecognize their tRNA molecules using the anticodon. These enzymes makeabout one mistake in 10,000. For most amino acids, this level ofaccuracy is not too difficult to achieve, since most of the amino acidsare quite different from one another.

In the subject method, an accurate description of the AARS bindingpocket for tRNA is important for the computational design approach,since it depends on the crystal structure for the protein backbonedescriptions, although in many cases it is perfectly acceptable to usecrystal structure of a homologous protein (for example, a homolog from arelated species) or even a conserved domain to substitute thecrystallographic binding pocket structure description. The crystalstructure also defines the orientation of the natural substrate aminoacid in the binding pocket of a synthetase, as well as the relativeposition of the amino acid substrate to the synthetase residues,especially those residues in and around the binding pocket. To designthe binding pocket for the analogs, it is preferred that these analogsbind to the synthetase in the same orientation as the natural substrateamino acid, since this orientation may be important for the adenylationstep.

The AARSs may be from any organism, including prokaryotes andeukaryotes, with enzymes from bacteria, fungi, extremeophiles such asthe archebacteria, worm, insects, fish, amphibian, birds, animals(particularly mammals and particularly human) and plants all possible.

As described above, most cells make twenty different aminoacyl-tRNAsynthetases, one for each type of amino acid. Some suitable synthetasesare known, including: yeast phenylalanyl-tRNA synthetase (Kwon et al.,J. Am. Chem. Soc. 125: 7512-7513, 2003); Methonococcus jannaschiityrosyl-tRNA synthetase (Wang et al., Science 292, 498-500, 2001); andyeast tyrosyl-tRNA synthetase (Ohno et al., J. Biochem. 130, 417-423,2001). In fact, the crystal structures of nearly all 20 different AARSenzymes are currently available in the Brookhaven Protein Data Bank(PDB, see Bernstein et al., J. Mol. Biol. 112: 535-542, 1977). A list ofall the AARSs with solved crystal structures as of April 2001 isavailable on the PDB website. For example, the crystal structure ofThermus Aquaticus Phenylalanyl tRNA Synthetase complexed withPhenylalanine has a resolution of 2.7 Å, and its PDB ID is 1B70.

The structure database or Molecular Modeling DataBase (MMDB) containsexperimental data from crystallographic and NMR structuredeterminations. The data for MMDB are obtained from the Protein DataBank (PDB). The NCBI (National Center for Biotechnology Information) hascross-linked structural data to bibliographic information, to thesequence databases, and to the NCBI taxonomy. Cn3D, the NCBI 3Dstructure viewer, can be used for easy interactive visualization ofmolecular structures from Entrez.

The Entrz 3D Domains database contains protein domains from the NCBIConserved Domain Database (CDD). Computational biologists defineconserved domains based on recurring sequence patterns or motifs. CDDcurrently contains domains derived from two popular collections, Smartand Pfam, plus contributions from colleagues at NCBI, such as COG. Thesource databases also provide descriptions and links to citations. Sinceconserved domains correspond to compact structural units, CDs containlinks to 3D-structure via Cn3D whenever possible.

To identify conserved domains in a protein sequence, the CD-Searchservice employs the reverse position-specific BLAST algorithm. The querysequence is compared to a position-specific score matrix prepared fromthe underlying conserved domain alignment. Hits may be displayed as apairwise alignment of the query sequence with a representative domainsequence, or as a multiple alignment. CD-Search now is run by default inparallel with protein BLAST searches. While the user waits for the BLASTqueue to further process the request, the domain architecture of thequery may already be studied. In addition, CDART, the Conserved DomainArchitecture Retrieval Tool allows user to search for proteins withsimilar domain architectures. CDART uses precomputed CD-search resultsto quickly identify proteins with a set of domains similar to that ofthe query. For more details, see Marchler-Bauer et al., Nucleic AcidsResearch 31: 383-387, 2003; and Marchler-Bauer et al., Nucleic AcidsResearch 30: 281-283, 2002.

In addition, a database of known aminoacyl tRNA synthetases has beenpublished by Maciej Szymanski, Marzanna A. Deniziak and JanBarciszewski, in Nucleic Acids Res. 29:288-290, 2001 (titled“Aminoacyl-tRNA synthetases database”). A corresponding website(http://rose.man.poznan.p1/aars/seq_main.html) provides details aboutall known AARSs from different species. For example, according to thedatabase, the Isoleucyl-tRNA Synthetase for the radioresistant bacteriaDeinococcus radiodurans (Accession No. AAF10907) has 1078 amino acids,and was published by White et al. in Science 286:1571-1577 (1999); theValyl-tRNA Synthetase for mouse (Mus musculus) has 1263 amino acids(Accession No. AAD26531), and was published by Snoek M. and van Vugt H.in Immunogenetics 49: 468-470 (1999); and the Phenylalanyl-tRNASynthetase sequences for human, Drosophila, S. pombe, S. cerevisiae,Candida albicans, E. coli, and mumerous other bacteria including Thermusaquaticus ssp. thermophilus are also available. The database was lastupdated in September 2003. Similar information for other newlyidentified AARSs can be obtained, for example, by conducting a BLASTsearch using any of the known sequences in the AARS database as queryagainst the available public (such as the non-redundant database atNCBI, or “nr”) or proprietory private databases.

Alternatively, in certain embodiments, if the exact crystal structure ofa particular AARS is not known, but its protein sequence is similar orhomologous to a known AARS sequence with a known crystal structure. Insuch instances, it is expected that the conformation of the AARS inquestion will be similar to the known crystal structure of thehomologous AARS. The known structure may, therefore, be used as thestructure for the AARS of interest, or more preferably, may be used topredict the structure of the AARS of interest (i.e., in “homologymodeling” or “molecular modeling”). As a particular example, theMolecular Modeling Database (MMDB) described above (see, Wang et al.,Nucl. Acids Res. 2000, 28:243-245; Marchler-Bauer et al., Nucl. AcidsRes. 1999, 27:240-243) provides search engines that may be used toidentify proteins and/or nucleic acids that are similar or homologous toa protein sequence (referred to as “neighboring” sequences in the MMDB),including neighboring sequences whose three-dimensional structures areknown. The database further provides links to the known structures alongwith alignment and visualization tools, such as Cn3D (developed byNCBI), RasMol, etc., whereby the homologous and parent sequences may becompared and a structure may be obtained for the parent sequence basedon such sequence alignments and known structures.

The homologous AARS sequence with known 3D-structure is preferably atleast about 60%, or at least about 70%, or at least about 80%, or atleast about 90%, or at least about 95% identical to the AARS of interestin the active site region or the pocket region for amino acid substratebinding. Such active site or pocket site may not be continuous in theprimary amino acid sequence of the AARS since distant amino acids maycome together in the 3D-structure. In this case, sequence homology oridentity can be calculated using, for example, the NCBI standard BLASTpprograms for protein using default conditions, in regions alignedtogether (without insertions or deletions in either of the two sequencesbeing compared) and including residues known to be involved in substrateamino acid binding. For example, the Thermus Aquaticus Phenylalanyl tRNASynthetase alpha subunit appears to have an “insert” region fromresidues 156 to 165 when compared to its homologs from other species.This region can be disregarded in calculating sequence identity.Alternatively, the homologous AARS is preferably about 35%, or 40%, or45%, or 50%, or 55% identical overall to the AARS of interest. The E.coli Phenylalanyl tRNA Synthetase alpha subunit is about 45% identicaloverall, and about 80% identical in the active site region to theThermus Aquaticus Phenylalanyl tRNA Synthetase. The human PhenylalanyltRNA Synthetase alpha subunits is about 62%, 60%, 54%, 50% identicaloverall to its Drosophila, worm (C. elegans), plant (Arabidopsisthaliana), yeast (S. cerevisiae) counterparts, respectively.

In the few cases where the structure for a particular AARS sequence maynot be known or available, it is typically possible to determine thestructure using routine experimental techniques (for example, X-raycrystallography and Nuclear Magnetic Resonance (NMR) spectroscopy) andwithout undue experimentation. See, e.g., NMR of Macromolecules: APractical Approach, G. C. K. Roberts, Ed., Oxford University Press Inc.,New York (1993); Ishima and Torchia, Nat. Struct. Biol. 7: 740-743,2000; Gardner and Kay, Annu. Rev. Bioph. Biom. 27: 357-406, 1998; Kay,Biochern. Cell. Biol. 75: 1-15, 1997; Dayie et al., Annu. Rev. Phys.Chem. 47: 243-282, 1996; Wuthrich, Acta Cyrstallogr. D 51: 249-270,1995; Kahn et al., J. Synchrotron Radiat. 7: 131-138, 2000; Oakley andWilce, Clin. Exp. Pharmacol. P. 27: 145-151, 2000; Fourme et al., J.Synchrotron Radiat. 6: 834-844, 1999.

Alternatively, and in less preferable embodiments, the three-dimensionalstructure of a AARS sequence may be calculated from the sequence itselfand using ab initio molecular modeling techniques already known in theart. See e.g., Smith et al., J. Comput. Biol. 4: 217-225, 1997;Eisenhaber et al., Proteins 24: 169-179, 1996; Bohm, Biophys Chem. 59:1-32, 1996; Fetrow and Bryant, BioTechnol. 11: 479-484, 1993; Swindellsand Thorton, Curr. Opin. Biotech. 2: 512-519, 1991; Levitt et al., Annu.Rev. Biochem. 66: 549-579, 1997; Eisenhaber et al., Crit. Rev. Biochem.Mol. 30:1-94, 1995; Xia et al., J. Mol. Biol. 300: 171-185, 2000; Jones,Curr. Opin. Struc. Biol. 10: 371-379, 2000. Three-dimensional structuresobtained from ab initio modeling are typically less reliable thanstructures obtained using empirical (e.g., NMR spectroscopy or X-raycrystallography) or semi-empirical (e.g., homology modeling) techniques.However, such structures will generally be of sufficient quality,although less preferred, for use in the methods of this invention.

For additional details, see section B below.

B. Methods for Predicting 3D Structure Based on Sequence Homology

For AARS proteins that have not been crystallized or been the focus ofother structural determinations, a computer-generated molecular model ofthe AARS and its binding site can nevertheless be generated using any ofa number of techniques available in the art. For example, the Cα-carbonpositions of the target AARS sequence can be mapped to a particularcoordinate pattern of an AARS enzyme (“known AARS”) having a similarsequence and deduced structure using homology modeling techniques, andthe structure of the target protein and velocities of each atomcalculated at a simulation temperature (To) at which a dockingsimulation with an amino acid analog is to be determined. Typically,such a protocol involves primarily the prediction of side-chainconformations in the modeled target AARS protein, while assuming amain-chain trace taken from a tertiary structure, such as provided bythe known AARS protein. Computer programs for performing energyminimization routines are commonly used to generate molecular models.For example, both the CHARMM (Brooks et al. (1983) J Comput Chem4:187-217) and AMBER (Weiner et al (1981) J. Comput. Chem. 106: 765)algorithms handle all of the molecular system setup, force fieldcalculation, and analysis (see also, Eisenfield et al. (1991) Am JPhysiol 261:C376-386; Lybrand (1991) J Pharm Belg 46:49-54; Froimowitz(1990) Biotechniques 8:640-644; Burbam et al. (1990) Proteins 7:99-111;Pedersen (1985) Environ Health Perspect 61:185-190; and Kini et al.(1991) J Biomol Struct Dyn 9:475-488). At the heart of these programs isa set of subroutines that, given the position of every atom in themodel, calculate the total potential energy of the system and the forceon each atom. These programs may utilize a starting set of atomiccoordinates, the parameters for the various terms of the potentialenergy function, and a description of the molecular topology (thecovalent structure). Common features of such molecular modeling methodsinclude: provisions for handling hydrogen bonds and other constraintforces; the use of periodic boundary conditions; and provisions foroccasionally adjusting positions, velocities, or other parameters inorder to maintain or change temperature, pressure, volume, forces ofconstraint, or other externally controlled conditions.

Most conventional energy minimization methods use the input coordinatedata and the fact that the potential energy function is an explicit,differentiable function of Cartesian coordinates, to calculate thepotential energy and its gradient (which gives the force on each atom)for any set of atomic positions. This information can be used togenerate a new set of coordinates in an effort to reduce the totalpotential energy and, by repeating this process over and over, tooptimize the molecular structure under a given set of externalconditions. These energy minimization methods are routinely applied tomolecules similar to the subject AARS proteins.

In general, energy minimization methods can be carried out for a giventemperature, Ti, which may be different than the docking simulationtemperature, To. Upon energy minimization of the molecule at Ti,coordinates and velocities of all the atoms in the system are computed.Additionally, the normal modes of the system are calculated. It will beappreciated by those skilled in the art that each normal mode is acollective, periodic motion, with all parts of the system moving inphase with each other, and that the motion of the molecule is thesuperposition of all normal modes. For a given temperature, the meansquare amplitude of motion in a particular mode is inverselyproportional to the effective force constant for that mode, so that themotion of the molecule will often be dominated by the low frequencyvibrations.

After the molecular model has been energy minimized at Ti, the system is“heated” or “cooled” to the simulation temperature, To, by carrying outan equilibration run where the velocities of the atoms are scaled in astep-wise manner until the desired temperature, To, is reached. Thesystem is further equilibrated for a specified period of time untilcertain properties of the system, such as average kinetic energy, remainconstant. The coordinates and velocities of each atom are then obtainedfrom the equilibrated system.

Further energy minimization routines can also be carried out. Forexample, a second class of methods involves calculating approximatesolutions to the constrained EOM for the protein. These methods use aniterative approach to solve for the Lagrange multipliers and, typically,only need a few iterations if the corrections required are small. Themost popular method of this type, SHAKE (Ryckaert et al. (1977) J ComputPhys 23:327; and Van Gunsteren et al. (1977) Mol Phys 34:1311) is easyto implement and scales as O(N) as the number of constraints increases.Therefore, the method is applicable to macromolecules such as AARSproteins. An alternative method, RATTLE (Anderson (1983) J Comput Phys52:24) is based on the velocity version of the Verlet algorithm. LikeSHAKE, RATTLE is an iterative algorithm and can be used to energyminimize the model of a subject AARS protein.

C. Alternative Methods

In other embodiments, rather than holding the identity of the amino acidanalog constant and varying the AARS structure (by modeling severaldifferent mutant structures), the subject method is carried out usingthe molecular model(s) for a single Modified AARS (e.g., in which onemore non-anchor amino acid residues are changed) and sampling a varietyof different amino acid analogs or potential fragments thereof, toidentify analogs which are likely to interact with, and be substratesfor the modified AARS enzyme. This approach can make use of coordinatelibraries for amino acid analogs (including rotamer variants) orlibraries of functional groups and spacers that can be joined to formthe side-chain of an amino acid analog.

Using such approaches as described above, e.g., homology modeling, acoordinate set for the binding site for the modified AARS can bederived.

There are a variety of computational methods that can be readily adaptedfor identifying the structure of amino acid analogs that would haveappropriate steric and electronic properties to interact with thesubstrate binding site of a Modified AARS. See, for example, Cohen etal. (1990) J. Med. Cam. 33: 883-894; Kuntz et al. (1982) J. Mol. Biol161: 269-288; DesJarlais (1988) J. Med. Cam. 31: 722-729; Bartlett etal. (1989) (Spec. Publ., Roy. Soc. Chem.) 78: 182-196; Goodford et al.(1985) J. Med. Cam. 28: 849-857; DesJarlais et al. J. Med. Cam. 29:2149-2153). Directed methods generally fall into two categories: (1)design by analogy in which 3-D structures of known molecules (such asfrom a crystallographic database) are docked to the AARS binding sitestructure and scored for goodness-of-fit; and (2) de novo design, inwhich the amino acid analog model is constructed piece-wise in the AARSbinding site. The latter approach, in particular, can facilitate thedevelopment of novel molecules, uniquely designed to bind to the subjectModified AARS binding site.

In an illustrative embodiment, the design of potential amino acidanalogs that may function with a particular modified AARS begins fromthe general perspective of shape complimentary for the substrate bindingsite of the enzyme, and a search algorithm is employed which is capableof scanning a database of small molecules of known three-dimensionalstructure for candidates which fit geometrically into the substratebinding site. Such libraries can be general small molecule libraries, orcan be libraries directed to amino acid analogs or small molecules whichcan be used to create amino acid analogs. It is not expected that themolecules found in the shape search will necessarily be leadsthemselves, since no evaluation of chemical interaction necessarily bemade during the initial search. Rather, it is anticipated that suchcandidates might act as the framework for further design, providingmolecular skeletons to which appropriate atomic replacements can bemade. Of course, the chemical complimentary of these molecules can beevaluated, but it is expected that atom types will be changed tomaximize the electrostatic, hydrogen bonding, and hydrophobicinteractions with the substrate binding site. Most algorithms of thistype provide a method for finding a wide assortment of chemicalstructures that may be complementary to the shape of the AARS substratebinding site.

For instance, each of a set of small molecules from a particulardata-base, such as the Cambridge Crystallographic Data Bank (CCDB)(Allen et al. (1973) J. Chem. Doc. 13: 119), is individually docked tothe binding site of the modified AARS in a number of geometricallypermissible orientations with use of a docking algorithm. In a preferredembodiment, a set of computer algorithms called DOCK, can be used tocharacterize the shape of invaginations and grooves that form thebinding site. See, for example, Kuntz et al. (1982) J. Mol. Biol. 161:269-288. The program can also search a database of small molecules fortemplates whose shapes are complementary to particular binding site ofthe modified AARS. Exemplary algorithms that can be adapted for thispurpose are described in, for example, DesJarlais et al. (1988) J MedChem 31:722-729.

The orientations are evaluated for goodness-of-fit and the best are keptfor further examination using molecular mechanics programs, such asAMBER or CHARMM. Such algorithms have previously proven successful infinding a variety of molecules that are complementary in shape to agiven binding site of a receptor or enzyme, and have been shown to haveseveral attractive features. First, such algorithms can retrieve aremarkable diversity of molecular architectures. Second, the beststructures have, in previous applications to other proteins,demonstrated impressive shape complementarity over an extended surfacearea. Third, the overall approach appears to be quite robust withrespect to small uncertainties in positioning of the candidate atoms.

In certain embodiments, the subject method can utilize an algorithmdescribed by Goodford (1985, J Med Chem 28:849-857) and Boobbyer et al.(1989, J Med Chem 32:1083-1094). Those papers describe a computerprogram (GRID) which seeks to determine regions of high affinity fordifferent chemical groups (termed probes) on the molecular surface ofthe binding site. GRID hence provides a tool for suggestingmodifications to known ligands that might enhance binding. It may beanticipated that some of the sites discerned by GRID as regions of highaffinity correspond to “pharmacophoric patterns” determinedinferentially from a series of known ligands. As used herein, apharmacophoric pattern is a geometric arrangement of features of theanticipated amino acid analog that is believed to be important forbinding. Goodsell and Olson (1990, Proteins: Struct Funct Genet.8:195-202) have used the Metropolis (simulated annealing) algorithm todock a single known ligand into a target protein, and their approach canbe adapted for identifying suitable amino acid analogs for docking withthe AARS binding site. This algorithm can allow torsional flexibility inthe amino acid side-chain and use GRID interaction energy maps as rapidlookup tables for computing approximate interaction energies.

Yet a further embodiment of the present invention utilizes a computeralgorithm such as CLIX which searches such databases as CCDB for smallmolecules which can be oriented in the substrate binding site of theAARS in a way that is both sterically acceptable and has a highlikelihood of achieving favorable chemical interactions between thecandidate molecule and the surrounding amino acid residues. The methodis based on characterizing the substrate binding site in terms of anensemble of favorable binding positions for different chemical groupsand then searching for orientations of the candidate molecules thatcause maximum spatial coincidence of individual candidate chemicalgroups with members of the ensemble. The current availability ofcomputer power dictates that a computer-based search for novel ligandsfollows a breadth-first strategy. A breadth-first strategy aims toreduce progressively the size of the potential candidate search space bythe application of increasingly stringent criteria, as opposed to adepth-first strategy wherein a maximally detailed analysis of onecandidate is performed before proceeding to the next. CLIX conforms tothis strategy in that its analysis of binding is rudimentary—it seeks tosatisfy the necessary conditions of steric fit and of having individualgroups in “correct” places for bonding, without imposing the sufficientcondition that favorable bonding interactions actually occur. A ranked“shortlist” of molecules, in their favored orientations, is producedwhich can then be examined on a molecule-by-molecule basis, usingcomputer graphics and more sophisticated molecular modeling techniques.CLIX is also capable of suggesting changes to the substituent chemicalgroups of the candidate molecules that might enhance binding. Again, thestarting library can be of amino acid analogs or of molecules which canbe used to generate the side-chain of an amino acid analog.

The algorithmic details of CLIX is described in Lawerence et al. (1992)Proteins 12:31-41, and the CLIX algorithm can be summarized as follows.The GRID program is used to determine discrete favorable interactionpositions (termed target sites) in the binding site of the AARS proteinfor a wide variety of representative chemical groups. For each candidateligand in the CCDB an exhaustive attempt is made to make coincident, ina spatial sense in the binding site of the protein, a pair of thecandidate's substituent chemical groups with a pair of correspondingfavorable interaction sites proposed by GRID. All possible combinationsof pairs of ligand groups with pairs of GRID sites are considered duringthis procedure. Upon locating such coincidence, the program rotates thecandidate ligand about the two pairs of groups and checks for sterichindrance and coincidence of other candidate atomic groups withappropriate target sites. Particular candidate/orientation combinationsthat are good geometric fits in the binding site and show sufficientcoincidence of atomic groups with GRID sites are retained.

Consistent with the breadth-first strategy, this approach involvessimplifying assumptions. Rigid protein and small molecule geometry ismaintained throughout. As a first approximation rigid geometry isacceptable as the energy minimized coordinates of the binding site ofthe modified AARS, describe an energy minimum for the molecule, albeit alocal one.

A further assumption implicit in CLIX is that the potential ligand, whenintroduced into the substrate binding site of the Modified AARS, doesnot induce change in the protein's stereochemistry or partial chargedistribution and so alter the basis on which the GRID interaction energymaps were computed. It must also be stressed that the interaction sitespredicted by GRID are used in a positional and type sense only, i.e.,when a candidate atomic group is placed at a site predicted as favorableby GRID, no check is made to ensure that the bond geometry, the state ofprotonation, or the partial charge distribution favors a stronginteraction between the protein and that group. Such detailed analysisshould form part of more advanced modeling of candidates identified inthe CLIX shortlist.

Yet another embodiment of a computer-assisted molecular design methodfor identifying amino acid analogs that may be utilized by apredetermined Modified AARS comprises the de novo synthesis of potentialinhibitors by algorithmic connection of small molecular fragments thatwill exhibit the desired structural and electrostatic complementaritywith the substrate binding site of the enzyme. The methodology employs alarge template set of small molecules with are iteratively piecedtogether in a model of the AARS' substrate binding site. Each stage ofligand growth is evaluated according to a molecular mechanics-basedenergy function, which considers van der Waals and coulombicinteractions, internal strain energy of the lengthening ligand, anddesolvation of both ligand and enzyme. The search space can be managedby use of a data tree which is kept under control by pruning accordingto the binding criteria.

In yet another embodiment, potential amino acid analogs can bedetermined using a method based on an energy minimization-quenchedmolecular dynamics algorithm for determining energetically favorablepositions of functional groups in the substrate binding site of amodified AARS enzyme. The method can aid in the design of molecules thatincorporate such functional groups by modification of known amino acidand amino acid analogs or through de novo synthesis.

For example, the multiple copy simultaneous search method (MCSS)described by Miranker et al. (1991) Proteins 11: 29-34 can be adaptedfor use in the subject method. To determine and characterize a localminima of a functional group in the force field of the protein, multiplecopies of selected functional groups are first distributed in a bindingsite of interest on the AARS protein. Energy minimization of thesecopies by molecular mechanics or quenched dynamics yields the distinctlocal minima. The neighborhood of these minima can then be explored by agrid search or by constrained minimization. In one embodiment, the MCSSmethod uses the classical time dependent Hartee (TDH) approximation tosimultaneously minimize or quench many identical groups in the forcefield of the protein.

Implementation of the MCSS algorithm requires a choice of functionalgroups and a molecular mechanics model for each of them. Groups must besimple enough to be easily characterized and manipulated (3-6 atoms, fewor no dihedral degrees of freedom), yet complex enough to approximatethe steric and electrostatic interactions that the functional groupwould have in substrate binding to the site of the AARS protein. Apreferred set is, for example, one in which most organic molecules canbe described as a collection of such groups (Patai's Guide to theChemistry of Functional Groups, ed. S. Patai (New York: John Wiley, andSons, (1989)). This includes fragments such as acetonitrile, methanol,acetate, methyl ammonium, dimethyl ether, methane, and acetaldehyde.

Determination of the local energy minima in the binding site requiresthat many starting positions be sampled. This can be achieved bydistributing, for example, 1,000-5,000 groups at random inside a spherecentered on the binding site; only the space not occupied by the proteinneeds to be considered. If the interaction energy of a particular groupat a certain location with the protein is more positive than a givencut-off (e.g. 5.0 kcal/mole) the group is discarded from that site.Given the set of starting positions, all the fragments are minimizedsimultaneously by use of the TDH approximation (Elber et al. (1990) J AmChem Soc 112: 9161-9175). In this method, the forces on each fragmentconsist of its internal forces and those due to the protein. Theessential element of this method is that the interactions between thefragments are omitted and the forces on the protein are normalized tothose due to a single fragment. In this way simultaneous minimization ordynamics of any number of functional groups in the field of a singleprotein can be performed.

Minimization is performed successively on subsets of, e.g. 100, of therandomly placed groups. After a certain number of step intervals, suchas 1,000 intervals, the results can be examined to eliminate groupsconverging to the same minimum. This process is repeated untilminimization is complete (e.g. RMS gradient of 0.01 kcal/mole/Å). Thusthe resulting energy minimized set of molecules comprises what amountsto a set of disconnected fragments in three dimensions representingpotential side-chains for amino acid analogs.

The next step then is to connect the pieces with spacers assembled fromsmall chemical entities (atoms, chains, or ring moieties) to form aminoacid analogs, e.g., each of the disconnected can be linked in space togenerate a single molecule using such computer programs as, for example,NEWLEAD (Tschinke et al. (1993) J Med Chem 36: 3863,3870). The procedureadopted by NEWLEAD executes the following sequence of commands (1)connect two isolated moieties, (2) retain the intermediate solutions forfurther processing, (3) repeat the above steps for each of theintermediate solutions until no disconnected units are found, and (4)output the final solutions, each of which is single molecule. Such aprogram can use for example, three types of spacers: library spacers,single-atom spacers, and fuse-ring spacers. The library spacers areoptimized structures of small molecules such as ethylene, benzene andmethylamide. The output produced by programs such as NEWLEAD consist ofa set of molecules containing the original fragments now connected byspacers. The atoms belonging to the input fragments maintain theiroriginal orientations in space. The molecules are chemically plausiblebecause of the simple makeup of the spacers and functional groups, andenergetically acceptable because of the rejection of solutions withvan-der Waals radii violations.

In addition, the order in which the steps of the present method areperformed is purely illustrative in nature. In fact, the steps can beperformed in any order or in parallel, unless otherwise indicated by thepresent disclosure.

Furthermore, the method of the present invention may be performed ineither hardware, software, or any combination thereof, as those termsare currently known in the art. In particular, the present method may becarried out by software, firmware, or microcode operating on a computeror computers of any type. Additionally, software embodying the presentinvention may comprise computer instructions in any form (e.g., sourcecode, object code, interpreted code, etc.) stored in anycomputer-readable medium (e.g., ROM, RAM, magnetic media, punched tapeor card, compact disc (CD) in any form, DVD, etc.). Furthermore, suchsoftware may also be in the form of a computer data signal embodied in acarrier wave, such as that found within the well-known Web pagestransferred among devices connected to the Internet. Accordingly, thepresent invention is not limited to any particular platform, unlessspecifically stated otherwise in the present disclosure.

Exemplary computer hardware means suitable for carrying out theinvention can be a Silicon Graphics Power Challenge server with 10R10000 processors running in parallel. Suitable software developmentenvironment includes CERIUS2 by Biosym/Molecular Simulations (San Diego,Calif.), or other equivalents.

The computational method described above has been effectively used inmodifying enzymes of the protein synthesis machinery (e.g. AARS) toallow incorporation of unnatural amino acids. The same suite ofcomputational tools can also be leveraged to design the final products(e.g., monoclonal antibodies or other therapeutics) in which theunnatural amino acids would be incorporated so as to enhance or modifytheir structural or functional properties.

While particular embodiments of the present invention have been shownand described, it will be apparent to those skilled in the art thatchanges and modifications may be made without departing from thisinvention in its broader aspect and, therefore, the appended claims areto encompass within their scope all such changes and modifications asfall within the true spirit of this invention.

2. Adoption of AARS from Different Organisms

A second strategy for generating an orthogonal tRNA/synthetase pairinvolves importing a tRNA/synthetase pair from another organism into thetranslation system of interest, such as Escherichia coli. In thisparticular example, the properties of the heterologous synthetasecandidate include, e.g., that it does not charge Escherichia coli tRNAreasonably well (preferably not at all), and the properties of theheterologous tRNA candidate include, e.g., that it is not acylated byEscherichia coli synthetase to a reasonable extent (preferably not atall). In addition, the O-tRNA derived therefrom is orthogonal to allEscherichia coli synthetases.

Schimmel et al. reported that Escherichia coli GlnRS (EcGlnRS) does notacylate Saccharomyces cerevisiae tRNAGln (EcGlnRS lacks an N-terminalRNA-binding domain possessed by Saccharomyces cerevisiae GlnRS(ScGlnRS)). See, E. F. Whelihan and P. Schimmel, EMBO J., 16:2968(1997). For example, the Saccharomyces cerevisiae amber suppressortRNAG1n (SctRNAG1nCUA) was analyzed to determine whether it is also nota substrate for EcGlnRS. In vitro aminoacylation assays showed this tobe the case; and in vitro suppression studies show that the SctRNAGInCUAis competent in translation. See, e.g., Liu and Schultz, Proc. Natl.Acad. Sci. USA, 96:4780 (1999). It was further shown that ScGlnRS doesnot acylate any Escherichia coli tRNA, only the SctRNAGlnCUA in vitro.The degree to which ScGlnRS is able to aminoacylate the SctRNAGlnCUA inEscherichia coli was also evaluated using an in vivo complementationassay. An amber nonsense mutation was introduced at a permissive site inthe β-lactamase gene. Suppression of the mutation by an amber suppressortRNA should produce full-length β-lactamase and confer ampicillinresistance to the cell. When only SctRNAGlnCUA is expressed, cellsexhibit an IC₅₀ of 20 μg/mL ampicillin, indicating virtually noacylation by endogenous Escherichia coli synthetases; when SctRNAGlnCUAis coexpressed with ScGlnRS, cells acquire an IC₅₀ of about 500 μg/mLampicillin, demonstrating that ScGlnRS acylates SctRNAGlnCUA efficientlyin Escherichia coli. See, Liu and Schultz, Proc. Natl. Acad. Sci. USA,96:4780 (1999). The Saccharomyces cerevisiae tRNAGlnCUA/GlnRS isorthogonal to Escherichia coli.

This strategy was also applied to a tRNAAsp/AspRS system. Saccharomycescerevisiae tRNA^(Asp) is known to be orthogonal to Escherichia colisynthetases. See, e.g., B. P. Doctor and J. A. Mudd, J. Biol. Chem.,238:3677 (1963); and, Y. Kwok and J. T. Wong, Can. J. Biochem., 58:213(1980). It was demonstrated that an amber suppressor tRNA derived fromit (SctRNA^(AsP) _(CUA)) is also orthogonal in Escherichia coli usingthe in vivo β-lactamase assay described above. However, the anticodon oftRNA^(Asp) is a critical recognition element of AspRS, see, e.g., R.Giege, C. Florentz, D. Kern, J. Gangloff, G. Eriani and D. Moras,Biochimie, 78:605 (1996), and mutation of the anticodon to CUA resultsin a loss of affinity of the suppressor for AspRS. An Escherichia coliAspRS E93K mutant has been shown to recognize Escherichia coli ambersuppressor tRNA^(AsP) _(CUA) about an order of magnitude better than wtAspRS. See, e.g., F. Martin, ‘Thesis’, Universite Louis Pasteur,Strasbourg, France, 1995. It was speculated that introduction of therelated mutation in Saccharomyces cerevisiae AspRS (E188K) might restoreits affinity for SctRNA^(Asp) _(CUA). It was determined that theSaccharomyces cerevisiae AspRS(E188K) mutant does not acylateEscherichia coli tRNAs, but charges SctRNA^(Asp) _(CUA) with moderateefficiency as shown by in vitro aminoacylation experiments. See, e.g.,M. Pastrnak, T. J. Magliery and P. G. Schultz, Hely. Chim. Acta, 83:2277(2000).

A similar approach involves the use of a heterologous synthetase as theorthogonal synthetase but a mutant initiator tRNA of the same organismor a related organism as the orthogonal tRNA. RajBhandary and coworkersfound that an amber mutant of human initiator tRNA^(fMet) is acylated byEscherichia coli GlnRS and acts as an amber suppressor in yeast cellsonly when EcGlnRS is coexpressed. See, A. K. Kowal, C. Kohrer and U. L.RajBhandary, Proc. Natl. Acad. Sci. USA, 98:2268 (2001). This pair thusrepresents an orthogonal pair for use in yeast. Also, an Escherichiacoli initiator tRNA^(fMet) amber mutant was found that is inactivetoward any Escherichia coli synthetases. A mutant yeast TyrRS wasselected that charges this mutant tRNA, resulting in an orthogonal pairin Escherichia coli. See, A. K. Kowal, et al, (2001), supra.

Using the methods of the present invention, the pairs and components ofpairs desired above are evolved to generate orthogonal tRNA/synthetasepairs that possess desired characteristic, e.g., that can preferentiallyaminoacylate an O-tRNA with an unnatural amino acid.

In certain embodiments, the O-tRNA and the O-RS can be derived bymutation of a naturally occurring tRNA and RS from a variety oforganisms. In one embodiment, the O-tRNA and O-RS are derived from atleast one organism, where the organism is a prokaryotic organism, e.g.,Methanococcus jannaschii, Methanobacterium thermoautotrophicum,Halobacterium, Escherichia coli, A. fulgidus, P. furiosus, P.horikoshii, A. pernix, T. thermophilus, or the like. Optionally, theorganism is a eukaryotic organism, e.g., plants (e.g., complex plantssuch as monocots, or dicots), algea, fungi (e.g., yeast, etc), animals(e.g., mammals, insects, arthropods, etc.), insects, protists, or thelike. Optionally, the O-tRNA is derived by mutation of a naturallyoccurring tRNA from a first organism and the O-RS is derived by mutationof a naturally occurring RS from a second organism. In one embodiment,the O-tRNA and O-RS can be derived from a mutated tRNA and mutated RS.In certain embodiments, the O-RS and O-tRNA pair from a first organismis provided to a translational system of a second organism, whichoptionally has non-functional endogenous RS/tRNA pair with respect tothe codons recognized by the O-tRNA.

The O-tRNA and the O-RS also can optionally be isolated from a varietyof organisms. In one embodiment, the O-tRNA and O-RS are isolated fromat least one organism, where the organism is a prokaryotic organism,e.g., Methanococcus jannaschii, Methanobacterium thermoautotrophicum,Halobacterium, Escherichia coli, A. fulgidus, P. furiosus, P.horikoshii, A. pernix, T. thermophilus, or the like. Optionally, theorganism is a eukaryotic organism, e.g., plants (e.g., complex plantssuch as monocots, or dicots), algea, fungi (e.g., yeast, etc), animals(e.g., mammals, insects, arthropods, etc.), insects, protists, or thelike. Optionally, the O-tRNA is isolated from a naturally occurring tRNAfrom a first organism and the O-RS is isolated from a naturallyoccurring RS from a second organism. In one embodiment, the O-tRNA andO-RS can be isolated from one or more library (which optionallycomprises one or more O-tRNA and/or O-RS from one or more organism(including those comprising prokaryotes and/or eukaryotes).

The orthogonal tRNA-RS pair, e.g., derived from at least a firstorganism or at least two organisms, which can be the same or different,can be used in a variety of organisms, e.g., a second organism. Thefirst and the second organisms of the methods of the present inventioncan be the same or different. As described above, the individualcomponents of a pair can be derived from the same organism or differentorganisms. For example, tRNA can be derived from a prokaryotic organism,e.g., an archaebacterium, such as Methanococcus jannaschii andHalobacterium NRC-1 or a eubacterium, such as Escherichia coli, whilethe synthetase can be derived from same or another prokaryotic organism,such as, Methanococcus jannaschii, Archaeoglobus fulgidus,Methanobacterium thermoautotrophicum, P. furiosus, P. horikoshii, A.pernix, T. thermophilus, Halobacterium, Escherichia coli or the like.Eukaryotic sources can also be used, e.g., plants (e.g., complex plantssuch as monocots, or dicots), algae, protists, fungi (e.g., yeast,etc.), animals (e.g., mammals, insects, arthropods, etc.), or the like.

Methods for selecting an orthogonal tRNA-tRNA synthetase pair for use inan in vivo translation system of a second organism are also included inthe present invention. The methods include: introducing a marker gene, atRNA and an aminoacyl-tRNA synthetase (RS) isolated or derived from afirst organism into a first set of cells from the second organism;introducing the marker gene and the tRNA into a duplicate cell set fromthe second organism; and, selecting for surviving cells in the first setthat fail to survive in the duplicate cell set, where the first set andthe duplicate cell set are grown in the presence of a selection agent,and where the surviving cells comprise the orthogonal tRNA-tRNAsynthetase pair for use in the in the in vivo translation system of thesecond organism. In one embodiment, comparing and selecting includes anin vivo complementation assay. In another embodiment, the concentrationof the selection agent is varied. The same assay may also be conductedin an in vitro system based on the second organism.

3. Generation of AARS by Mutagenesis and Selection/Screening

In certain embodiments, the AARS capable of charging a particularorthogonal tRNA with a particular unnatural amino acid can be obtainedby mutagenesis of the AARS to generate a library of candidates, followedby screening and/or selection of the candidate AARS's capable of theirdesired function. Such orthogonal AARSs (O-RSs) and orthogonal tRNAs(O-tRNAs) may be used for in vitro/in vivo production of desiredproteins with modified unnatural amino acids.

Thus methods for generating components of the protein biosyntheticmachinery, such as the O-RSs, O-tRNAs, and orthogonal O-tRNA/O-RS pairsthat can be used to incorporate an unnatural amino acid are provided inthe present invention. Methods for selecting an orthogonal tRNA-tRNAsynthetase pair for use in in vivo translation system of an organism arealso provided below.

In one embodiment, methods for producing at least one recombinantorthogonal aminoacyl-tRNA synthetase (O-RS) comprise: (a) generating alibrary of (optionally mutant) RSs derived from at least oneaminoacyl-tRNA synthetase (RS) from a first organism, e.g., a eukaryoticorganism (such as a yeast), or a prokaryotic organism, such asMethanococcus jannaschii, Methanobacterium thermoautotrophicum,Halobacterium, Escherichia coli, A. fulgidus, P. furiosus, P.horikoshii, A. pernix, T. thermophilus, or the like; (b) selecting(and/or screening) the library of RSs (optionally mutant RSs) formembers that aminoacylate an orthogonal tRNA (O-tRNA) in the presence ofan unnatural amino acid and a natural amino acid, thereby providing apool of active (optionally mutant) RSs; and/or, (c) selecting(optionally through negative selection) the pool for active RSs (e.g.,mutant RSs) that preferentially aminoacylate the O-tRNA in the absenceof the unnatural amino acid, thereby providing the at least onerecombinant O-RS; wherein the at least one recombinant O-RSpreferentially aminoacylates the O-tRNA with the unnatural amino acid.Recombinant O-RSs produced by the methods are also included in thepresent invention.

In one embodiment, the RS is an inactive RS. The inactive RS can begenerated by mutating an active RS. For example, the inactive RS can begenerated by mutating at least about 1, at least about 2, at least about3, at least about 4, at least about 5, at least about 6, or at leastabout 10 or more amino acids to different amino acids, e.g., alanine.

Libraries of mutant RSs can be generated using various mutagenesistechniques known in the art. For example, the mutant RSs can begenerated by site-specific mutations, random mutations, diversitygenerating recombination mutations, chimeric constructs, and by othermethods described herein or known in the art.

In one embodiment, selecting (and/or screening) the library of RSs(optionally mutant RSs) for members that are active, e.g., thataminoacylate an orthogonal tRNA (O-tRNA) in the presence of an unnaturalamino acid and a natural amino acid, includes: introducing a positiveselection or screening marker, e.g., an antibiotic resistance gene, orthe like, and the library of (optionally mutant) RSs into a plurality ofcells, wherein the positive selection and/or screening marker comprisesat least one codon, whose translation (optionally conditionally) dependson the ability of a candidate O-RS to charge the O-tRNA (with either anatural and/or a unnatural amino acid); growing the plurality of cellsin the presence of a selection agent; identifying cells that survive (orshow a specific response) in the presence of the selection and/orscreening agent by successfully translate the codon in the positiveselection or screening marker, thereby providing a subset of positivelyselected cells that contains the pool of active (optionally mutant) RSs.Optionally, the selection and/or screening agent concentration can bevaried. Preferably, the cells do not contain any functional endogenoustRNA/RS pair that can help to translate the codon. The endogenoustRNA/RS pair may be disabled by gene deletion and/or RS inhibitors.

Since many essential genes of the cell likely also contain such codonthat depends on the ability of O-RS to charge O-tRNA at the absence offunctional endogenous RS/tRNA pair, in one embodiment, no extra positiveselection markers are needed for the positive selection process—thesurvival of the cell can be used as a readout of the positive selectionprocess.

In one aspect, the positive selection marker is a chloramphenicolacetyltransferase (CAT) gene. Optionally, the positive selection markeris a β-lactamase gene. In another aspect the positive screening markercomprises a fluorescent or luminescent screening marker or an affinitybased screening marker (e.g., a cell surface marker).

In a similar embodiment, a cell-free in vitro system may be used to testthe ability of O-RS to charge O-tRNA in a positive screening. Forexample, the ability of the in vitro system to translate a positivescreening gene, such as a fluorescent marker gene, may depend on theability of O-RS to charge O-tRNA to read through a codon of the markergene.

In one embodiment, negatively selecting or screening the pool for activeRSs (optionally mutants) that preferentially aminoacylate the O-tRNA inthe absence of the unnatural amino acid includes: introducing a negativeselection or screening marker with the pool of active (optionallymutant) RSs from the positive selection or screening into a plurality oftranslational system, wherein the negative selection or screening markercomprises at least one codon (e.g., codon for a toxic marker gene, e.g.,a ribonuclease barnase gene), whose translation depends on the abilityof a candidate O-RS to charge the O-tRNA (with a natural amino acid);and, identifying the translation system that shows a specific screeningresponse in a first media supplemented with the unnatural amino acid anda screening or selection agent, but fail to show the specific responsein a second media supplemented with the natural amino acid and theselection or screening agent, thereby providing surviving cells orscreened cells with the at least one recombinant O-RS.

For example, in an in vitro negative selection system, if the successfultranslation of a toxin gene depends on the ability of O-RS to chargeO-tRNA to read through at least one codon of the toxin gene, the abilityof the system to produce the toxin protein at the presence of theunnatural amino acid, but not the presence of the natural amino acidreflects the ability of the O-RS to charge O-tRNA with unnatural aminoacid but not natural amino acid.

In one aspect, the concentration of the selection (and/or screening)agent is varied. In some aspects the first and second organisms aredifferent. Thus, the first and/or second organism optionally comprises:a prokaryote, a eukaryote, a mammal, an Escherichia coli, a fungi, ayeast, an archaebacterium, a eubacterium, a plant, an insect, a protist,etc. In other embodiments, the screening marker comprises a fluorescentor luminescent screening marker or an affinity based screening marker.

Also, some aspects include wherein the negative selction markercomprises a ribonuclease barnase gene (which comprises at least one saidcodon). Other aspects include wherein the screening marker optionallycomprises a fluorescent or luminescent screening marker or an affinitybased screening marker. In the embodiments herein, the screenings and/orselections optionally include variation of the screening and/orselection stringency.

In one embodiment, the methods for producing at least one recombinantorthogonal aminoacyl-tRNA synthetase (O-RS) can further comprise: (d)isolating the at least one recombinant O-RS; (e) generating a second setof O-RS (optionally mutated) derived from the at least one recombinantO-RS; and, (f) repeating steps (b) and (c) until a mutated O-RS isobtained that comprises an ability to preferentially aminoacylate theO-tRNA. Optionally, steps (d)-(f) are repeated, e.g., at least about twotimes. In one aspect, the second set of mutated O-RS derived from atleast one recombinant O-RS can be generated by mutagenesis, e.g., randommutagenesis, site-specific mutagenesis, recombination or a combinationthereof.

The stringency of the selection/screening steps, e.g., the positiveselection/screening step (b), the negative selection/screening step (c)or both the positive and negative selection/screening steps (b) and (c),in the above-described methods, optionally includes varying theselection/screening stringency. In another embodiment, the positiveselection/screening step (b), the negative selection/screening step (c)or both the positive and negative selection/screening steps (b) and (c)comprise using a reporter, wherein the reporter is detected byfluorescence-activated cell sorting (FACS) or wherein the reporter isdetected by luminescence. Optionally, the reporter is displayed on acell surface, on a phage display or the like and selected based uponaffinity or catalytic activity involving the unnatural amino acid or ananalogue. In one embodiment, the mutated synthetase is displayed on acell surface, on a phage display or the like.

The methods embodied herein optionally comprise wherein the unnaturalamino acid is selected from, e.g.: an O-methyl-L-tyrosine, anL-3-(2-naphthyl)alanine, a 3-methyl-phenylalanine, anO-4-allyl-L-tyrosine, a 4-propyl-L-tyrosine, atri-O-acetyl-GlcNAcβ-serine, an L-Dopa, a fluorinated phenylalanine, anisopropyl-L-phenylalanine, a p-azido-L-phenylalanine, ap-acyl-L-phenylalanine, a p-benzoyl-L-phenylalanine, an L-phosphoserine,a phosphonoserine, a phosphonotyrosine, a p-iodo-phenylalanine, ap-bromophenylalanine, a p-amino-L-phenylalanine, and anisopropyl-L-phenylalanine A recombinant O-RS produced by the methodsherein is also included in the current invention.

In a related aspect, methods for producing a recombinant orthogonal tRNA(O-tRNA) include: (a) generating a library of mutant tRNAs derived fromat least one tRNA, from a first organism; (b) selecting (e.g.,negatively selecting) or screening the library for (optionally mutant)tRNAs that are aminoacylated by an aminoacyl-tRNA synthetase (RS) from asecond organism in the absence of a RS from the first organism, therebyproviding a pool of tRNAs (optionally mutant); and, (c) selecting orscreening the pool of tRNAs (optionally mutant) for members that areaminoacylated by an introduced orthogonal RS (O-RS), thereby providingat least one recombinant O-tRNA; wherein the at least one recombinantO-tRNA recognizes a degenerate codon and is not efficiency recognized bythe RS from the second organism and is preferentially aminoacylated bythe O-RS.

In some embodiments the at least one tRNA that preferentially binds to adegenerate codon with stronger affinity than that of a correspondingendogenous tRNA. In one embodiment, the recombinant O-tRNA possesses animprovement of orthogonality. It will be appreciated that in someembodiments, O-tRNA is optionally imported into a first organism from asecond organism without the need for modification. In variousembodiments, the first and second organisms are either the same ordifferent and are optionally chosen from, e.g., prokaryotes (e.g.,Methanococcus jannaschii, Methanobacteium thermoautotrophicum,Escherichia coli, Halobacterium, etc.), eukaryotes, mammals, fungi,yeasts, archaebacteria, eubacteria, plants, insects, protists, etc.Additionally, the recombinant tRNA is optionally aminoacylated by anunnatural amino acid, wherein the unnatural amino acid is biosynthesizedin vivo either naturally or through genetic manipulation. The unnaturalamino acid is optionally added to a growth medium for at least the firstor second organism.

Methods for generating specific O-tRNA/O-RS pairs are provided. Methodsinclude: (a) generating a library of mutant tRNAs derived from at leastone tRNA from a first organism; (b) negatively selecting or screeningthe library for (optionally mutant) tRNAs that are aminoacylated by anaminoacyl-tRNA synthetase (RS) from a second organism in the absence ofa RS from the first organism, thereby providing a pool of (optionallymutant) tRNAs; (c) selecting or screening the pool of (optionallymutant) tRNAs for members that are aminoacylated by an introducedorthogonal RS (O-RS), thereby providing at least one recombinant O-tRNA.The at least one recombinant O-tRNA preferentially recognizes adegenerate codon and is not efficiently recognized by the RS from thesecond organism and is preferentially aminoacylated by the O-RS. Themethod also includes (d) generating a library of (optionally mutant) RSsderived from at least one aminoacyl-tRNA synthetase (RS) from a thirdorganism; (e) selecting or screening the library of mutant RSs formembers that preferentially aminoacylate the at least one recombinantO-tRNA in the presence of an unnatural amino acid and a natural aminoacid, thereby providing a pool of active (optionally mutant) RSs; and,(f) negatively selecting or screening the pool for active (optionallymutant) RSs that preferentially aminoacylate the at least onerecombinant O-tRNA in the absence of the unnatural amino acid, therebyproviding the at least one specific O-tRNA/O-RS pair, wherein the atleast one specific O-tRNA/O-RS pair comprises at least one recombinantO-RS that is specific for the unnatural amino acid and the at least onerecombinant O-tRNA. Specific O-tRNA/O-RS pairs produced by the methodsare included. Additionally, such methods include wherein the first andthird organism are the same (e.g., Methanococcus jannaschii).

The organisms of the present invention comprise a variety of organismand a variety of combinations. For example, the first and the secondorganisms of the methods of the present invention can be the same ordifferent. In one embodiment, the organisms are optionally a prokaryoticorganism, e.g., Methanococcus jannaschii, Methanobacteriumthermoautotrophicum, Halobacterium, Escherichia coli, A. fulgidus, P.furiosus, P. horikoshii, A. pernix, T. thennophilus, or the like.Alternatively, the organisms optionally comprise a eukaryotic organism,e.g., plants (e.g., complex plants such as monocots, or dicots), algae,protists, fungi (e.g., yeast, etc), animals (e.g., mammals, insects,arthropods, etc.), or the like. In another embodiment, the secondorganism is a prokaryotic organism, e.g., Methanococcus jannaschii,Methanobacterium thermoautotrophicum, Halobacterium, Escherichia coli,A. fulgidus, Halobacterium, P. furiosus, P. horikoshii, A. pernix, T.thennophilus, or the like. Alternatively, the second organism can be aeukaryotic organism, e.g., a yeast, a animal cell, a plant cell, afungus, a mammalian cell, or the like. In various embodiments the firstand second organisms are different.

The various methods of the invention (above) optionally comprise whereinselecting or screening comprises one or more positive or negativeselection or screening, e.g., a change in amino acid permeability, achange in translation efficiency, and a change in translationalfidelity. Additionally, the one or more change is optionally based upona mutation in one or more gene in an organism in which an orthogonaltRNA-tRNA synthetase pair are used to produce such protein. Selectingand/or screening herein optionally comprises wherein at least 2 codonswithin one or more selection gene or within one or more screening geneare used. Such multiple codons are optionally within the same gene orwithin different screening/selection genes. Additionally, the optionalmultiple codons are optionally different codons or comprise the sametype of codons.

Kits are an additional feature of the invention. For example, the kitscan include one or more translation system as noted above (e.g., acell), one or more unnatural amino acid, e.g., with appropriatepackaging material, containers for holding the components of the kit,instructional materials for practicing the methods herein and/or thelike. Similarly, products of the translation systems (e.g., proteinssuch as EPO analogues comprising unnatural amino acids) can be providedin kit form, e.g., with containers for holding the components of thekit, instructional materials for practicing the methods herein and/orthe like.

VI. Nucleic Acid and Polypeptide Sequence Variants

As described herein, the invention provides for nucleic acidpolynucleotide sequences and polypeptide amino acid sequences, e.g.,O-tRNAs and O-RSs (and their coding polynucleotides thereof), and, e.g.,compositions and methods comprising the sequences. Examples of thesequences, e.g., O-tRNAs and O-RSs are disclosed herein. However, one ofskill in the art will appreciate that the invention is not limited tothose sequences disclosed herein. One of skill will appreciate that thepresent invention also provides many related and unrelated sequenceswith the functions described herein, e.g., encoding an O-tRNA or anO-RS.

One of skill will also appreciate that many variants of the disclosedsequences are included in the invention. For example, conservativevariations of the disclosed sequences that yield a functionallyidentical sequence are included in the invention. Variants of thenucleic acid polynucleotide sequences, wherein the variants hybridize toat least one disclosed sequence, are considered to be included in theinvention. Unique subsequences of the sequences disclosed herein, asdetermined by, e.g., standard sequence comparison techniques, are alsoincluded in the invention.

VII. Exemplary Uses

Well over 100 non-coded amino acids (all ribosomally acceptable) havebeen reportedly introduced into proteins using other methods (see, forexample, Schultz et al., J. Am. Chem. Soc., 103: 1563-1567, 1981;Hinsberg et al., J. Am. Chem. Soc., 104: 766-773, 1982; Pollack et al.,Science, 242: 1038-1040, 1988; Nowak et al., Science, 268: 439-442,1995) all these analogs may be used in the subject methods for efficientincorporation of these analogs into protein products. In general, themethod of the instant invention can be used to incorporate amino acidanalogs into protein products either in vitro or in vivo.

In another preferred embodiment, two or more analogs may be used in thesame in vitro or in vivo translation system, each with its O-tRNA/O-RSpairs. This is more easily accomplished when a natural amino acid isencoded by four or more codons (such as six for Leu and Arg). However,for amino acids encoded by only two codons, one can be reserved for thenatural amino acid, while the other “shared” by one or more amino acidanalog(s). These analogs may resemble only one natural amino acid (forexample, different Phe analogs), or resemble different amino acids (forexample, analogs of Phe and Tyr).

For in vitro use, one or more O-RSs of the instant invention can berecombinantly produced and supplied to any the available in vitrotranslation systems (such as the commercially available Wheat GermLysate-based PROTEINscript-PRO™, Ambion's E. coli system for coupled invitro transcription/translation; or the rabbit reticulocyte lysate-basedRetic Lysate IVT™ Kit from Ambion). Optionally, the in vitro translationsystem can be selectively depleted of one or more natural AARSs (by, forexample, immunodepletion using immobilized antibodies against naturalAARS) and/or natural amino acids so that enhanced incorporation of theanalog can be achieved. Alternatively, nucleic acids encoding there-designed O-RSs may be supplied in place of recombinantly producedAARSs. The in vitro translation system is also supplied with the analogsto be incorporated into mature protein products.

Although in vitro protein synthesis usually cannot be carried out on thesame scale as in vivo synthesis, in vitro methods can yield hundreds ofmicrograms of purified protein containing amino acid analogs. Suchproteins have been produced in quantities sufficient for theircharacterization using circular dichroism (CD), nuclear magneticresonance (NMR) spectrometry, and X-ray crystallography. Thismethodology can also be used to investigate the role of hydrophobicity,packing, side chain entropy and hydrogen bonding in determining proteinstability and folding. It can also be used to probe catalytic mechanism,signal transduction and electron transfer in proteins. In addition, theproperties of proteins can be modified using this methodology. Forexample, photocaged proteins can be generated that can be activated byphotolysis, and novel chemical handles have been introduced intoproteins for the site specific incorporation of optical and otherspectroscopic probes.

The development of a general approach for the incorporation of aminoacid analogs into proteins in vivo, directly from the growth media,would greatly enhance the power of unnatural amino acid mutagenesis. Forexample, the ability to synthesize large quantities of proteinscontaining heavy atoms would facilitate protein structure determination,and the ability to site-specifically substitute fluorophores orphotocleavable groups into proteins in living cells would providepowerful tools for studying protein function in vivo. Alternatively, onemight be able to enhance the properties of proteins by providingbuilding blocks with new functional groups, such as a keto-containingamino acid.

For in vivo use, one or more AARS of the instant invention can besupplied to a host cell (prokaryotic or eukaryotic) as geneticmaterials, such as coding sequences on plasmids or viral vectors, whichmay optionally integrate into the host genome and constitutively orinducibly express the re-designed AARSs. A heterologous or endogenousprotein of interest can be expressed in such a host cell, at thepresence of supplied amino acid analogs. The protein products can thenbe purified using any art-recognized protein purification techniques, ortechniques specially designed for the protein of interest.

The above described uses are merely a few possible means for generatinga transcript which encodes a polypeptide. In general, any means known inthe art for generating transcripts can be employed to synthesizeproteins with amino acid analogs. For example, any in vitrotranscription system or coupled transcription/translation systems can beused for generate a transcript of interest, which then serves as atemplate for protein synthesis. Alternatively, any cell, engineeredcell/cell line, or functional components (lysates, membrane fractions,etc.) that is capable of expressing proteins from genetic materials canbe used to generate a transcript. These means for generating atranscript will typically include such components as RNA polymerase (T7,SP6, etc.) and co-factors, nucleotides (ATP, CTP, GTP, UTP), necessarytranscription factors, and appropriate buffer conditions, as well as atleast one suitable DNA template, but other components may also added foroptimized reaction condition. A skilled artisan would readily envisionother embodiments similar to those described herein.

The following section describes a few specific uses of the instantmethods and systems for unnatural amino acid incorporation. These aremeant to be illustrative and by no means limiting in any respect.

A. Long-Acting Human Protein Pharmaceuticals

Most administered protein pharmaceuticals are cleared rapidly from thebody, necessitating frequent, often daily injections. Thus there isconsiderable interest in developing long-acting protein therapeuticsthat are able to maintain efficacious levels in the body for longperiods of time, providing patients with greater therapeutic benefits.For example, PEGylation-based drug delivery technology, the mostcommonly used method for increasing protein half-life, is already usedin six approved drugs, with annual sales exceeding an aggregate ofUS$3.0 billion. The field is expanding rapidly, with over a dozenadditional PEGylation-based drugs in the product pipelines of leadingbiotechnology and pharmaceutical companies.

PEGylation is a process to covalently attach oligosaccharides andsynthetic polymers such as polyethylene glycol (PEG) site-specificallyonto therapeutic protein molecules. PEGylation can significantly enhanceprotein half-life by shielding the polypeptide from proteolytic enzymesand increasing the apparent size of the protein, thus reducing clearancerates. Moreover, PEG conjugates can enhance protein solubility and havebeneficial effects on biodistribution. The physical and pharmacologicalproperties of PEGylated proteins are affected by the number and the sizeof PEG chains attached to the polypeptide, the location of the PEGsites, and the chemistry used for PEGylation. Examples of PEGconjugation to proteins include reactions of N-hydroxysuccinimidyl esterderivatized PEGs with lysine, 1,4-addition reactions of maleimide andvinylsulfone derivatized PEGs with cysteine, and condensation ofhydrazide containing PEGs with aldehydes generated by oxidation ofglycoproteins. When more than one reactive site is present in a protein(e.g., multiple amino or thiol groups) or reactive electrophiles areused, nonselective attachment of one or multiple PEG molecules canoccur, leading to the generation of a heterogeneous mixture that isdifficult to separate. The lack of selectivity and positional control inthe attachment of PEG chains can lead to significant losses inbiological activity and possibly enhanced immunogenicity of theconjugated protein. In fact, historically, loss of biological activityand product heterogeneity have been the two most common problemsencountered in the development of long-acting protein pharmaceuticalsusing standard PEGylation techniques. Modification of proteins withamine-reactive PEGs typically results in drastic loss of biologicalactivity due to modification of lysine residues located in regions ofthe protein important for biological activity. In certain situations,bioactivity of growth hormones may be reduced 400-fold or more. Forexample, bioactivity of GCSF is reduced 1,000-fold when the proteins aremodified using conventional amine-PEGylation technologies (Clark et al.,J. Biol. Chem. 271: 21969, 1996; Bowen et al., Exp. Hematol. 27, 425,1999). Thus there is a need for a method that allows for the completelysite-specific and irreversible attachment of PEG chains to proteins.

It would be advantageous to use advanced protein engineeringtechnologies to create long-acting, “patient friendly” human proteinpharmaceuticals, by, for example, incorporating unnatural amino acidsinto a drug protein, such that the engineered drug protein may achievelonger half life and/or sustained or even enhanced biological activity.Towards this end, the instant invention may be used to overcome problemssuch as heterogeneity and loss of activity inherent in standardamine-PEGylation techniques. Incorporating unnatural amino acids willprovide unique, pre-determined sites away from the binding or thecatalytic site on the target protein where PEG molecules can besite-specifically conjugated. In addition, PEG molecules may be attachedto unnatural amino acids through techniques other than amine-PEGylation,thus sparing the primary amine groups of lysines from undesirablePEGylation. The major advantages of such protein engineeringtechnologies include the creation of next-generation, proprietaryproteins that:

-   -   Are homogeneously modified    -   Retain high biological activity and remain longer in the body    -   Have increased potency and stability and decreased        immunogenicity    -   Are consistent lot to lot in biological activities

These techniques may be used to enhance the half-life, efficacy, and/orsafety of bio-pharmaceuticals in all areas, including the specific fieldof cancer, endocrinology, infectious disease, and inflammation, etc.

As an illustrative example, the copper-mediated Huisgen [3+2]cycloaddition (Tornoe et al., J. Org. Chem. 67: 3057, 2002; Rostovtsevet al., Angew. Chem., Int. Ed. 41: 596, 2002; and Wang et al., J. Am.Chem. Soc. 125: 3192, 2003) of an azide and an alkyne is orthogonal toall functional groups found in proteins, and forms a stable triazolelinkage, this reaction can be used for the selective PEGylation ofproteins. For example, Deiters et al. (Bioorg. Med. Chem. Lett. 14(23):5743-5745, 2004) report a generally applicable PEGylation methodologybased on the site-specific incorporation of para-azidophenylalanine intoproteins in yeast. The azido group was used in a mild [3+2]cycloaddition reaction with an alkyne derivatized PEG reagent to affordselectively PEGylated protein. This strategy should be useful for thegeneration of selectively PEGylated proteins for therapeuticapplications.

B. Enhance Half-Life of Cytokines and Growth Factors Through IncreasedRecycling:

Besides clearance through kidneys and the liver, a significantproportion of biotherapeutics are cleared through receptor-mediateddegradation. Cytokines and growth factors, when bound to theirreceptors, are internalized into cellular compartments called endosomeswhere the receptor-ligand complexes are degraded. However, those ligandsthat dissociate rapidly from their receptors in the endosome arerecycled back to the cell surface and avoid depletion, thereby elicitingincreased half-life.

Sarkar et al. reported an approach to use natural amino acids to designa variant of G-CSF, which has reduced binding affinity for its receptorin the endosome, thus achieving a half-life of 500 hours, compared toonly about 50 hours for unmodified GSCF (Sarkar et al., NatureBiotechnology 20, 908-913, 2002). Specifically, Sarkar et al. usedcomputationally predicted histidine substitutions that switchprotonation states between cell-surface and endosomal pH. Molecularmodeling of binding electrostatics indicates two differentsingle-histidine mutants that fulfill the design requirements.Experimental assays demonstrate that each mutant indeed exhibits anorder-of-magnitude increase in medium half-life along with enhancedpotency due to increased endocytic recycling.

However, chemistries offered by natural amino acids to modulate thebinding process are limited in number and scope. In contrast, unnaturalamino acids will offer a significantly better spectrum of usefulchemistries, and thus more control on ligand-receptor bindingaffinities. Such improvements will exhibit more efficient ligandrecycling, leading to increase in ligand half-life by orders ofmagnitudes. This method for designing cytokines and growth factors thatexhibit reduced receptor-mediated degradation will be very useful inproviding an alternative strategy for increasing half-life of thosemolecules that are not amenable to PEGylation.

Thus the instant invention provides a method to incorporate unnaturalamino acids, the unique chemistries of which can be leveraged fordesigning the next generation of cytokines and growth factors thatmaintain high binding affinities for receptors on the cell surface,while having significantly lower binding affinities once they areinternalized.

C. Glycosylation Through Unnatural Amino Acids

The post-translational modification of proteins by glycosylation canaffect protein folding and stability, modify the intrinsic activity ofproteins, and modulate their interactions with other biomolecules. See,e.g., Varki, Glycobiology 3: 97-130, 1993. Natural glycoproteins areoften present as a population of many different glycoforms, which makesanalysis of glycan structure and the study of glycosylation effects onprotein structure and function difficult. Therefore, methods for thesynthesis of natural and unnatural homogeneously glycosylated proteinsare needed for the systematic understanding of glycan function, and forthe development of improved glycoprotein therapeutics.

One previously known approach for making proteins having desiredglycosylation patterns makes use of glycosidases to convert aheterogeneous natural glycoprotein to a simple homogenous core, ontowhich saccharides can then be grafted sequentially with glycosyltransferases. See, e.g., Witte et al., J. Am. Chem. Soc. 119: 2114-2118,1997. A limitation of this approach is that the primary glycosylationsites are predetermined by the cell line in which the protein isexpressed. Alternatively, a glycopeptide containing the desired glycanstructure can be synthesized by solid phase peptide synthesis. Thisglycopeptide can be coupled to other peptides or recombinant proteinfragments to afford a larger glycoprotein by native chemical ligation(see, e.g., Shin et al., J. Am. Chem. Soc. 121: 11684-11689, 1999),expressed protein ligation (see, e.g., Tolbert and Wong, J. Am. Chem.Soc. 122: 5421-5428, 2000), or with engineered proteases (see, e.g.,Witte et al., J. Am. Chem. Soc. 120: 1979-1989, 1998). Both nativechemical ligation and expressed protein ligation are most effective withsmall proteins, and necessitate a cysteine residue at the N-terminus ofthe glycopeptide. When a protease is used to ligate peptides together,the ligation site must be placed far away from the glycosylation sitefor good coupling yields. See, e.g., Witte et al., J. Am. Chem. Soc.120: 1979-1989, 1998. A third approach is to modify proteins withsaccharides directly using chemical methods. Good selectivity can beachieved with haloacetamide saccharide derivatives, which are coupled tothe thiol group of cysteine (see, e.g., Davis and Flitsch, TetrahedronLett. 32: 6793-6796, 1991; and Macmillan et al., Org Lett 4: 1467-1470,2002). But this method can become problematic with proteins that havemore than one cysteine residue.

Accordingly, a need exists for improved methods for making glycoproteinshaving a desired glycosylation pattern. The instant invention fulfillsthis and other needs.

The instant invention provides methods for synthesis of glycoproteins.These methods involve, in some embodiments, incorporating into a proteinan unnatural amino acid that comprises a first reactive group; andcontacting the protein with a saccharide moiety that comprises a secondreactive group, wherein the first reactive group reacts with the secondreactive group, thereby forming a covalent bond that attaches thesaccharide moiety to the unnatural amino acid of the protein.

Glycoproteins produced by these methods are also included in theinvention.

The first reactive group is, in some embodiments, an electrophilicmoiety (e.g., a keto moiety, an aldehyde moiety, and/or the like), andthe second reactive group is a nucleophilic moiety. In some embodiments,the first reactive group is a nucleophilic moiety and the secondreactive group is an electrophilic moiety (e.g., a keto moiety, analdehyde moiety, and/or the like). For example, an electrophilic moietyis attached to the saccharide moiety and the nucleophilic moiety isattached to the unnatural amino acid. The saccharide moiety can includea single carbohydrate moiety, or the saccharide moiety can include twoor more carbohydrate moieties.

In some embodiments, the methods further involve contacting thesaccharide moiety with a glycosyl transferase, a sugar donor moiety, andother reactants required for glycosyl transferase activity for asufficient time and under appropriate conditions to transfer a sugarfrom the sugar donor moiety to the saccharide moiety. The product ofthis reaction can, if desired, be contacted by at least a secondglycosyl transferase, together with the appropriate sugar donor moiety.

In certain embodiments, the method further comprises contacting thesaccharide moiety with one or more of a β1-4N-acetylglucosaminyltransferase, an α1,3-fucosyl transferase, an α1,2-fucosyl transferase,an α1,4-fucosyl transferase, a β1-4-galactosyl transferase, a sialyltransferase, and/or the like, to form a biantennary or triantennaryoligosaccharide structure.

In one embodiment, the saccharide moiety comprises a terminal GlcNAc,the sugar donor moiety is UDP-Gal and the glycosyl transferase is aβ-1,4-galactosyl transferase.

In one embodiment, the saccharide moiety comprises a terminal GlcNAc,the sugar donor moiety is UDP-GlcNAc and the glycosyl transferase is aβ1-4N-acetylglucosaminyl transferase.

Optionally, the method further comprises contacting the product of theN-acetylglucosaminyl transferase reaction with a β1-4-mannosyltransferase and GDP-mannose to form a saccharide moiety that comprisesManβ1-4GlcNAcβ1-4GlcNAc-. Optionally, the method further comprisescontacting the Manβ1-4GlcNAcβ1-4GlcNAc-moiety with an α1-3mannosyltransferase and GDP-mannose to form a saccharide moiety that comprisesManα1-3Manβ1-4GlcNAcβ1-4GlcNAc-. Optionally, the method furthercomprises contacting the Manα1-3Manβ1-4GlcNAcβ1-4GlcNAc-moiety with anα1-6 mannosyl transferase and GDP-mannose to form a saccharide moietythat comprises Manα1-6(Manα1-3)Manβ1-4GlcNAcβ1-4GlcNAc-. Optionally, themethod further comprises contacting theManα1-6(Manα1-3)Manβ1-4GlcNAcβ1-4GlcNAc-moiety with aβ1-2N-acetylglucosaminyl transferase and UDP-GlcNAc to form a saccharidemoiety that comprisesManα1-6(GlcNAcβ1-2Manα1-3)Manβ1-4GlcNAcβ1-4GlcNAc-. Optionally, themethod further comprises contacting theManα1-6(GlcNAcβ1-2Manα1-3)Manβ1-4GlcNAcβ1-4GlcNAc-moiety with aβ1-2N-acetylglucosaminyl transferase and UDP-GlcNAc to form a saccharidemoiety that comprisesGlcNAcβ1-2Manα1-6(GlcNAcβ1-2Manα1-3)Manβ1-4GlcNAcβ1-4GlcNAc-.

The step of incorporating into a protein an unnatural amino acid thatcomprises a first reactive group, in some embodiments, comprises usingan orthogonal tRNA/orthogonal aminoacyl-tRNA synthetase (O-tRNA/O-RS)pair, where the O-tRNA preferentially recognizes a degenerate codon forwild-type tRNA, and incorporates the unnatural amino acid into theprotein in response to the degenerate codon, and wherein the O-RSpreferentially aminoacylates the O-tRNA with the unnatural amino acid.In some embodiments, the unnatural amino acid is incorporated into thepolypeptide in vivo.

The invention also provides glycoproteins that comprise a saccharidemoiety and a polypeptide. In certain embodiments in the glycoproteins ofthe invention, the saccharide moiety is attached to the polypeptide by areaction product of a nucleophilic reaction between a first reactivegroup attached to an unnatural amino acid present in the polypeptide anda second reactive group attached to the saccharide moiety. In certainembodiments, the first reactive group is an electrophilic moiety (e.g.,keto moiety, aldehyde moiety, and/or the like) and the second reactivegroup is a nucleophilic moiety.

A wide variety of suitable reactive groups are known to those of skillin the art. Such suitable reactive groups can include, for example,amino, hydroxyl, carboxyl, carboxylate, carbonyl, alkenyl, alkynyl,aldehyde, ester, ether (e.g. thio-ether), amide, amine, nitrile, vinyl,sulfide, sulfonyl, phosphoryl, or similarly chemically reactive groups.Additional suitable reactive groups include, but are not limited to,maleimide, N hydroxysuccinimide, sulfo-N-hydroxysuccinimide,nitrilotriacetic acid, activated hydroxyl, haloacetyl (e.g.,bromoacetyl, iodoacetyl), activated carboxyl, hydrazide, epoxy,aziridine, sulfonylchloride, trifluoromethyldiaziridine,pyridyldisulfide, N-acyl-imidazole, imidazolecarbamate, vinylsulfone,succinimidylcarbonate, arylazide, anhydride, diazoacetate, benzophenone,isothiocyanate, isocyanate, imidoester, fluorobenzene, biotin andavidin.

In some embodiments, one of the reactive groups is an electrophilicmoiety, and the second reactive group is a nucleophilic moiety. Eitherthe nucleophilic moiety or the electrophilic moiety can be attached tothe side-chain of the unnatural amino acid; the corresponding group isthen attached to the saccharide moiety.

Suitable electrophilic moieties that react with nucleophilic moieties toform a covalent bond are known to those of skill in the art. In certainembodiments, such electrophilic moieties include, but are not limitedto, e.g., carbonyl group, a sulfonyl group, an aldehyde group, a ketonegroup, a hindered ester group, a thioester group, a stable imine group,an epoxide group, an aziridine group, etc.

Suitable nucleophilic moieties that can react with electrophilic moietyare known to those of skill in the art. In certain embodiments, suchnucleophiles include, for example, aliphatic or aromatic amines, such asethylenediamine. In certain embodiments, the nucleophilic moietiesinclude, but are not limited to, e.g., —NR1-NH₂ (hydrazide),—NR1(C═O)NR2NH₂ (semicarbazide), —NR1(C═S)NR2NH₂ (thiosemicarbazide),—(C═O)NR1NH₂ (carbonylhydrazide), —(C═S)NR1NH₂ (thiocarbonylhydrazide),—(SO₂)NR1NH₂ (sulfonylhydrazide), —NR1NR2(C═O)NR3NH₂ (carbazide),NR1NR2(C═S)NR3NH₂ (thiocarbazide), —O—NH₂ (hydroxylamine), and the like,where each R1, R2, and R3 is independently H, or alkyl having 1-6carbons, preferably H. In certain embodiments, the reactive group is ahydrazide, hydroxylamine, semicarbazide, carbohydrazide, asulfonylhydrazide, or the like.

The product of the reaction between the nucleophile and theelectrophilic moiety typically incorporates the atoms originally presentin the nucleophilic moiety. Typical linkages obtained by reacting thealdehydes or ketones with the nucleophilic moieties include reactionproducts such as an oxime, an amide, a hydrazone, a reduced hydrazone, acarbohydrazone, a thiocarbohydrazone, a sulfonylhydrazone, asemicarbazone, a thiosemicarbazone, or similar functionality, dependingon the nucleophilic moiety used and the electrophilic moiety (e.g.,aldehyde, ketone, and/or the like) that is reacted with the nucleophilicmoiety. Linkages with carboxylic acids are typically referred to ascarbohydrazides or as hydroxamic acids. Linkages with sulfonic acids aretypically referred to as sulfonylhydrazides or N-sulfonylhydroxylamines.The resulting linkage can be subsequently stabilized by chemicalreduction.

Other aspects of the invention include methods for synthesis of aglycoprotein by incorporating into a protein an unnatural amino acidthat comprises a saccharide moiety. A glycoprotein produced by themethod is also a feature of the invention. In certain embodiments, theincorporating step comprises using an orthogonal tRNA/orthogonalaminoacyl-tRNA synthetase (O-tRNA/O-RS) pair, wherein the O-tRNArecognizes a degenerate codon and incorporates the unnatural amino acidthat comprises a saccharide moiety (e.g., a β-O-GlcNAc-L-serine, atri-acetyl-β-GlcNAc-serine, a tri-O-acetyl-GalNAc-α-threonine, anα-GalNAc-L-threonine, and/or the like) into the protein in response tothe degenerate codon, and wherein the O-RS preferentially aminoacylatesthe O-tRNA with the unnatural amino acid. In one embodiment, theincorporating step is performed in vivo.

These methods can further involve contacting the saccharide moiety witha glycosyl transferase, a sugar donor moiety, and other reactantsrequired for glycosyl transferase activity for a sufficient time andunder appropriate conditions to transfer a sugar from the sugar donormoiety to the saccharide moiety. In certain embodiments, the methodfurther comprises contacting the product of the glycosyl transferasereaction with at least a second glycosyl transferase and a second sugardonor moiety. In other words, the invention provides methods in which anamino acid-linked saccharide moiety or an unnatural amino acid thatincludes a saccharide moiety is further glycosylated. Theseglycosylation steps are preferably (though not necessarily) carried outenzymatically using, for example, a glycosyltransferase, glycosidase, orother enzyme known to those of skill in the art. In some embodiments, aplurality of enzymatic steps are carried out in a single reactionmixture that contains two or more different glycosyl transferases. Forexample, one can conduct a galactosylating and a sialylating stepsimultaneously by including both sialyl transferase and galactosyltransferase in the reaction mixture.

For enzymatic saccharide syntheses that involve glycosyl transferasereactions, the recombinant cells of the invention optionally contain atleast one heterologous gene that encodes a glycosyl transferase. Manyglycosyl transferases are known, as are their polynucleotide sequences.See, e.g., “The WWW Guide To Cloned Glycosyl transferases,” (availableon the World Wide Web). Glycosyl transferase amino acid sequences andnucleotide sequences encoding glycosyl transferases from which the aminoacid sequences can be deduced are also found in various publiclyavailable databases, including GenBank, Swiss-Prot, EMBL, and others.

In certain embodiments, a glycosyl transferase of the inventionincludes, but is not limited to, e.g., a galactosyl transferase, afucosyl transferase, a glucosyl transferase, an N-acetylgalactosaminyltransferase, an N-acetylglucosaminyl transferase, a glucuronyltransferase, a sialyl transferase, a mannosyl transferase, a glucuronicacid transferase, a galacturonic acid transferase, an oligosaccharyltransferase, and the like. Suitable glycosyl transferases include thoseobtained from eukaryotes or prokaryotes.

An acceptor for the glycosyl transferases will be present on theglycoprotein to be modified by the methods of the invention. Suitableacceptors, include, for example, galactosyl acceptors such asGalβ1,4GalNAc-; Galβ1,3GalNAc-; lacto-N-tetraose-; Galβ1,3GlcNAc-;Galβ1,4GlcNAc-; Galβ1,3Ara-; Galβ1,6GlcNAc-; and Galβ1,4Glc-(lactose).Other acceptors known to those of skill in the art (see, e.g., Paulsonet al., J. Biol. Chem. 253: 5617-5624, 1978). Typically, the acceptorsform part of a saccharide moiety chain that is attached to theglycoprotein.

In one embodiment, the saccharide moiety comprises a terminal GlcNAc,the sugar donor moiety is UDP-GlcNAc and the glycosyl transferase is aβ1-4N-acetylglucosaminyl transferase. In another embodiment, thesaccharide moiety comprises a terminal GlcNAc, the sugar donor moiety isUDP-Gal and the glycosyl transferase is a β1-4-galactosyl transferase.Additional sugars can be added.

The glycosylation reactions include, in addition to the appropriateglycosyl transferase and acceptor, an activated nucleotide sugar thatacts as a sugar donor for the glycosyl transferase. The reactions canalso include other ingredients that facilitate glycosyl transferaseactivity. These ingredients can include a divalent cation (e.g., Mg²⁺ orMn²⁺), materials necessary for ATP regeneration, phosphate ions, andorganic solvents. The concentrations or amounts of the various reactantsused in the processes depend upon numerous factors including reactionconditions such as temperature and pH value, and the choice and amountof acceptor saccharides to be glycosylated. The reaction medium may alsocomprise solubilizing detergents (e.g., Triton or SDS) and organicsolvents such as methanol or ethanol, if necessary.

The invention also provides host cells (e.g., mammalian cells, yeastcells, bacterial cells, plant cells, fungal cells, archaebacterialcells, insect cells, and/or the like) that are useful for synthesizing aglycoprotein. These host cells contain: a) (optionally) an unnaturalamino acid that comprises a saccharide moiety (which may be synthesizedby the host cell itself, or be provided exogenously through the culturemedia or extracellular environment in which the host cell lives); b) anorthogonal tRNA that recognizes a degenerate codon (supra); c) anorthogonal aminoacyl tRNA synthetase (O-RS) that catalyzes attachment ofthe unnatural amino acid to the orthogonal tRNA; d) (optionally) apolynucleotide that encodes a glycosyl transferase; and e) apolynucleotide sequence that encodes a target/desired polypeptide andcomprises at least one degenerate codon that can be preferentiallyrecognized by the O-tRNA.

Also provided by the invention are compositions that include atranslation system. The translation systems include an orthogonal tRNA(O-tRNA) and an orthogonal aminoacyl tRNA synthetase (O-RS), wherein theO-RS preferentially aminoacylates the O-tRNA with an unnatural aminoacid that comprises a saccharide moiety (e.g., a β-O-GlcNAc-L-serine, atri-acetyl-β-GlcNAc-serine, a tri-O-acetyl-GalNAc-α-threonine, anα-GalNAc-L-threonine, and/or the like), and the O-tRNA recognizes atleast one degenerate codon described above.

As used herein, the term “saccharide moiety” refers to natural andunnatural sugar moieties (i.e., a unnaturally occurring sugar moiety,e.g., a sugar moiety that is modified, e.g., at one or more hydroxyl oramino positions, e.g., dehydroxylated, deaminated, esterified, etc.,e.g., 2-deoxyGal is an example of an unnatural sugar moiety).

The term “carbohydrate” has the general formula (CH₂O)_(n), andincludes, but is not limited to, e.g., monosaccharides, disaccharides,oligosaccharides and polysaccharides. Oligosaccharides are chainscomposed of saccharide units, which are alternatively known as sugars.Saccharide units can be arranged in any order and the linkage betweentwo saccharide units can occur in any of approximately ten differentways. The following abbreviations are used herein: Ara=arabinosyl;Fru=fructosyl; Fuc=fucosyl; Gal=galactosyl;GalNAc=N-acetylgalactosaminyl; Glc=glucosyl;GlcNAc=N-acetylglucosaminyl; Man=mannosyl; and NeuAc=sialyl (typicallyN-acetylneuraminyl).

Oligosaccharides are considered to have a reducing end and anon-reducing end, whether or not the saccharide at the reducing end isin fact a reducing sugar. In accordance with accepted nomenclature,oligosaccharides are depicted herein with the non-reducing end on theleft and the reducing end on the right. All oligosaccharides describedherein are described with the name or abbreviation for the non-reducingsaccharide (e.g., Gal), followed by the configuration of the glycosidicbond (α or β), the ring bond, the ring position of the reducingsaccharide involved in the bond, and then the name or abbreviation ofthe reducing saccharide (e.g., GlcNAc). The linkage between two sugarsmay be expressed, for example, as 2,3; 2→3; 2-3; or (2,3). Natural andunnatural linkages (e.g., 1-2; 1-3; 1-4; 1-6; 2-3; 2-4; 2-6; etc.)between two sugars are included in the invention. Each saccharide is apyranose.

The term “sialic acid” (abbreviated “Sia”) refers to any member of afamily of nine-carbon carboxylated sugars. The most common member of thesialic acid family is N-acetyl-neuraminic acid(2-keto-5-acetamindo-3,5-dideoxy-D-glycero-D-galactononulopyranos-1-onicacid) (often abbreviated as Neu5Ac, NeuAc, or NANA). A second member ofthe family is N-glycolyl-neuraminic acid (Neu5Gc or NeuGc), in which theN-acetyl group of NeuAc is hydroxylated. A third sialic acid familymember is 2-keto-3-deoxy-nonulosonic acid (KDN) (Nadano et al., J. Biol.Chem. 261: 11550-11557, 1986; Kanamori et al., J. Biol. Chem. 265:21811-21819, 1990). Also included are 9-substituted sialic acids such asa 9-O—C1-C6 acyl-Neu5Ac like 9-O-lactyl-Neu5Ac or 9-O-acetyl-Neu5Ac,9-deoxy-9-fluoro-Neu5Ac and 9-azido-9-deoxy-Neu5Ac. For review of thesialic acid family, see, e.g., Varki, Glycobiology 2: 25-40, 1992;Sialic Acids: Chemistry, Metabolism and Function, R. Schauer, Ed.(Springer-Verlag, New York (1992). The synthesis and use of sialic acidcompounds in a sialylation procedure is described in, for example,international application WO 92/16640 (entire contents incorporatedherein by reference).

Donor substrates for glycosyl transferases are activated nucleotidesugars. Such activated sugars generally consist of uridine and guanosinediphosphate, and cytidine monophosphate, derivatives of the sugars inwhich the nucleoside diphosphate or monophosphate serves as a leavinggroup. Bacterial, plant, and fungal systems can sometimes use otheractivated nucleotide sugars.

The incorporation of an unnatural amino acid, e.g., an unnatural aminoacid comprising a moiety where a saccharide moiety can be attached, oran unnatural amino acid that includes a saccharide moiety, can be doneto, e.g., tailor changes in protein structure and/or function, e.g., tochange size, acidity, nucleophilicity, hydrogen bonding, hydrophobicity,accessibility of protease target sites, target access to a proteinmoiety, etc. Proteins that include an unnatural amino acid, e.g., anunnatural amino acid comprising a moiety where a saccharide moiety canbe attached, or an unnatural amino acid that includes a saccharidemoiety, can have enhanced, or even entirely new, catalytic or physicalproperties. For example, the following properties are optionallymodified by inclusion of an unnatural amino acid, e.g., an unnaturalamino acid comprising a moiety where a saccharide moiety can beattached, or an unnatural amino acid that includes a saccharide moietyinto a protein: toxicity, biodistribution, structural properties,spectroscopic properties, chemical and/or photochemical properties,catalytic ability, half-life (e.g., serum half-life), ability to reactwith other molecules, e.g., covalently or noncovalently, and the like.The compositions including proteins that include at least one unnaturalamino acid, e.g., an unnatural amino acid comprising a moiety where asaccharide moiety can be attached, or an unnatural amino acid thatincludes a saccharide moiety are useful for, e.g., novel therapeutics,diagnostics, catalytic enzymes, industrial enzymes, binding proteins(e.g., antibodies), and e.g., the study of protein structure andfunction. See, e.g., Dougherty, (2000) Unnatural Amino Acids as Probesof Protein Structure and Function, Current Opinion in Chemical Biology,4:645-652.

In one aspect of the invention, a composition includes at least oneprotein with at least one, e.g., at least about two, three, four, five,six, seven, eight, nine, or at least about ten or more unnatural aminoacids, e.g., an unnatural amino acid comprising a moiety where asaccharide moiety can be attached, or an unnatural amino acid thatincludes a saccharide moiety, and/or which include another unnaturalamino acid. The unnatural amino acids can be the same or different,e.g., there can be 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more differentsites in the protein that comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 ormore different unnatural amino acids. In another aspect, a compositionincludes a protein with at least one, but fewer than all, of aparticular amino acid present in the protein substituted with theunnatural amino acid, e.g., an unnatural amino acid comprising a moietywhere a saccharide moiety can be attached, or an unnatural amino acidthat includes a saccharide moiety. For a given protein with more thanone unnatural amino acids, the unnatural amino acids can be identical ordifferent (e.g., the protein can include two or more different types ofunnatural amino acids, or can include two of the same unnatural aminoacid). For a given protein with more than two unnatural amino acids, theunnatural amino acids can be the same, different, or a combination ofmultiple unnatural amino acids of the same kind with at least onedifferent unnatural amino acid.

Essentially any protein (or portion thereof) that includes an unnaturalamino acid, e.g., an unnatural amino acid comprising a moiety where asaccharide moiety is attached, such as an aldehyde- or keto-derivatizedamino acid, or an unnatural amino acid that includes a saccharide moiety(and any corresponding coding nucleic acid, e.g., which includes one ormore selector codons) can be produced using the compositions and methodsherein. No attempt is made to identify the hundreds of thousands ofknown proteins, any of which can be modified to include one or moreunnatural amino acid, e.g., by tailoring any available mutation methodsto include one or more appropriate degenerate codons in a relevanttranslation system. Common sequence repositories for known proteinsinclude GenBank EMBL, DDBJ and the NCBI. Other repositories can easilybe identified by searching the internet.

Typically, the proteins are, e.g., at least about 60%, 70%, 75%, 80%,90%, 95%, or at least about 99% or more identical to any availableprotein (e.g., a therapeutic protein, a diagnostic protein, anindustrial enzyme, or portion thereof, and the like), and they compriseone or more unnatural amino acid. Examples of therapeutic, diagnostic,and other proteins that can be modified to comprise one or moreunnatural amino acid, e.g., an unnatural amino acid comprising a moietywhere a saccharide moiety is attached, or an unnatural amino acid thatincludes a saccharide moiety, can be found, but not limited to, those inWO 2002/085923, supra. Examples of therapeutic, diagnostic, and otherproteins that can be modified to comprise one or more unnatural aminoacid that comprises an amino acid, where a saccharide moiety is linkedand/or an unnatural amino acid that includes a saccharide moietyinclude, but are not limited to, e.g., Alpha-1 antitrypsin, Angiostatin,Antihemolytic factor, antibodies (further details on antibodies arefound below), Apolipoprotein, Apoprotein, Atrial natriuretic factor,Atrial natriuretic polypeptide, Atrial peptides, C-X-C chemokines (e.g.,T39765, NAP-2, ENA-78, Gro-a, Gro-b, Gro-c, IP-10, GCP-2, NAP-4, SDF-1,PF4, MIG), Calcitonin, CC chemokines (e.g., Monocyte chemoattractantprotein-1, Monocyte chemoattractant protein-2, Monocyte chemoattractantprotein-3, Monocyte inflammatory protein-1 alpha, Monocyte inflammatoryprotein-1 beta, RANTES, I309, R83915, R91733, HCC1, T58847, D31065,T64262), CD40 ligand, C-kit Ligand, Collagen, Colony stimulating factor(CSF), Complement factor 5a, Complement inhibitor, Complement receptor1, cytokines, (e.g., epithelial Neutrophil Activating Peptide-78,GROα/MGSA, GROβ, GROγ, MIP-1α, MIP-1δ, MCP-1), Epidermal Growth Factor(EGF), Erythropoietin (“EPO”, representing a preferred target formodification by the incorporation of one or more unnatural amino acid),Exfoliating toxins A and B, Factor IX, Factor VII, Factor VIII, FactorX, Fibroblast Growth Factor (FGF), Fibrinogen, Fibronectin, G-CSF,GM-CSF, Glucocerebrosidase, Gonadotropin, growth factors, Hedgehogproteins (e.g., Sonic, Indian, Desert), Hemoglobin, Hepatocyte GrowthFactor (HGF), Hirudin, Human serum albumin, Insulin, Insulin-like GrowthFactor (IGF), interferons (e.g., IFN-α, IFN-β, IFN-γ), interleukins(e.g., IL-1, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10,IL-11, IL-12, etc.), Keratinocyte Growth Factor (KGF), Lactoferrin,leukemia inhibitory factor, Luciferase, Neurturin, Neutrophil inhibitoryfactor (NIF), oncostatin M, Osteogenic protein, Parathyroid hormone,PD-ECSF, PDGF, peptide hormones (e.g., Human Growth Hormone),Pleiotropin, Protein A, Protein G, Pyrogenic exotoxins A, B, and C,Relaxin, Renin, SCF, Soluble complement receptor I, Soluble I-CAM 1,Soluble interleukin receptors (IL-1,2,3,4, 5, 6, 7, 9, 10, 11, 12, 13,14, 15), Soluble TNF receptor, Somatomedin, Somatostatin, Somatotropin,Streptokinase, Superantigens, i.e., Staphylococcal enterotoxins (SEA,SEB, SEC1, SEC2, SEC3, SED, SEE), Superoxide dismutase (SOD), Toxicshock syndrome toxin (TSST-1), Thymosin alpha 1, Tissue plasminogenactivator, Tumor necrosis factor beta (TNF beta), Tumor necrosis factorreceptor (TNFR), Tumor necrosis factor-alpha (TNF alpha), VascularEndothelial Growth Factor (VEGEF), Urokinase and many others.

One class of proteins that can be made using the compositions andmethods for in vivo incorporation of an unnatural amino acid, e.g., anunnatural amino acid comprising a moiety where a saccharide moiety canbe attached, or an unnatural amino acid that includes a saccharidemoiety described herein, includes transcriptional modulators or aportion thereof. Example transcriptional modulators include genes andtranscriptional modulator proteins that modulate cell growth,differentiation, regulation, or the like. Transcriptional modulators arefound in prokaryotes, viruses, and eukaryotes, including fungi, plants,yeasts, insects, and animals, including mammals, providing a wide rangeof therapeutic targets. It will be appreciated that expression andtranscriptional activators regulate transcription by many mechanisms,e.g., by binding to receptors, stimulating a signal transductioncascade, regulating expression of transcription factors, binding topromoters and enhancers, binding to proteins that bind to promoters andenhancers, unwinding DNA, splicing pre-mRNA, polyadenylating RNA, anddegrading RNA.

One class of proteins of the invention (e.g., proteins with one or moreunnatural amino acid that comprises an amino acid, where a saccharidemoiety is linked, and/or an unnatural amino acid that includes asaccharide moiety) include expression activators such as cytokines,inflammatory molecules, growth factors, their receptors, and oncogeneproducts, e.g., interleukins (e.g., IL-1, IL-2, IL-8, etc.),interferons, FGF, IGF-I, IGF-II, FGF, PDGF, TNF, TGF-α, TGF-β, EGF, KGF,SCF/c-Kit, CD40L/CD40, VLA-4/VCAM-1, ICAM-1/LFA-1, and hyalurin/CD44;signal transduction molecules and corresponding oncogene products, e.g.,Mos, Ras, Raf, and Met; and transcriptional activators and suppressors,e.g., p53, Tat, Fos, Myc, Jun, Myb, Rel, and steroid hormone receptorssuch as those for estrogen, progesterone, testosterone, aldosterone, theLDL receptor ligand and corticosterone.

Enzymes (e.g., industrial enzymes) or portions thereof with at least oneunnatural amino acid, e.g., an unnatural amino acid comprising a moietywhere a saccharide moiety is attached, or an unnatural amino acid thatincludes a saccharide moiety, are also provided by the invention.Examples of enzymes include, but are not limited to, e.g., amidases,amino acid racemases, acylases, dehalogenases, dioxygenases,diarylpropane peroxidases, epimerases, epoxide hydrolases, esterases,isomerases, kinases, glucose isomerases, glycosidases, glycosyltransferases, haloperoxidases, monooxygenases (e.g., p450s), lipases,lignin peroxidases, nitrile hydratases, nitrilases, proteases,phosphatases, subtilisins, transaminase, and nucleases.

Many proteins that can be modified according to the invention arecommercially available (see, e.g., the Sigma BioSciences catalogue andprice list), and the corresponding protein sequences and genes and,typically, many variants thereof, are well-known (see, e.g., Genbank).Any of them can be modified by the insertion of one or more unnaturalamino acid that comprises an amino acid, where a saccharide moiety islinked, or that includes an unnatural amino acid that includes asaccharide moiety according to the invention, e.g., to alter the proteinwith respect to one or more therapeutic, diagnostic or enzymaticproperties of interest. Examples of therapeutically relevant propertiesinclude serum half-life, shelf half-life, stability, immunogenicity,therapeutic activity, detectability (e.g., by the inclusion of reportergroups (e.g., labels or label binding sites) in the unnatural aminoacids, specificity, reduction of LD50 or other side effects, ability toenter the body through the gastric tract (e.g., oral availability), orthe like. Examples of relevant diagnostic properties include shelfhalf-life, stability, diagnostic activity, detectability, specificity,or the like. Examples of relevant enzymatic properties include shelfhalf-life, stability, specificity, enzymatic activity, productioncapability, or the like.

A variety of other proteins can also be modified to include one or moreunnatural amino acids of the invention. For example, the invention caninclude substituting one or more natural amino acids in one or morevaccine proteins with an unnatural amino acid that comprises an aminoacid, where a saccharide moiety is linked, or by incorporating anunnatural amino acid that includes a saccharide moiety, e.g., inproteins from infectious fungi, e.g., Aspergillus, Candida species;bacteria, particularly E. coli, which serves a model for pathogenicbacteria, as well as medically important bacteria such as Staphylococci(e.g., aureus), or Streptococci (e.g., pneumoniae); protozoa such assporozoa (e.g., Plasmodia), rhizopods (e.g., Entamoeba) and flagellates(Trypanosoma, Leishmania, Trichomonas, Giardia, etc.); viruses such as(+) RNA viruses (examples include Poxviruses e.g., vaccinia;Picornaviruses, e.g. polio; Togaviruses, e.g., rubella; Flaviviruses,e.g., HCV; and Coronaviruses), (−) RNA viruses (e.g., Rhabdoviruses,e.g., VSV; Paramyxovimses, e.g., RSV; Orthomyxovimses, e.g., influenza;Bunyaviruses; and Arenaviruses), dsDNA viruses (Reoviruses, forexample), RNA to DNA viruses, i.e., Retroviruses, e.g., HIV and HTLV,and certain DNA to RNA viruses such as Hepatitis B.

Agriculturally related proteins such as insect resistance proteins(e.g., the Cry proteins), starch and lipid production enzymes, plant andinsect toxins, toxin-resistance proteins, Mycotoxin detoxificationproteins, plant growth enzymes (e.g., Ribulose 1,5-BisphosphateCarboxylase/Oxygenase, “RUBISCO”), lipoxygenase (LOX), andPhosphoenolpyruvate (PEP) carboxylase are also suitable targets formodification by incorporation of unnatural amino acids and/or saccharideadditions of invention.

In certain embodiments, the protein or polypeptide of interest (orportion thereof) in the methods and/or compositions of the invention isencoded by a nucleic acid. Typically, the nucleic acid comprises atleast one degenerate codon, at least about two, three, four, five, six,seven, eight, nine, or at least about ten or more degenerate codons.

Thus the above-described artificial (e.g., man-made, and not naturallyoccurring) polypeptides and polynucleotides are also features of theinvention.

An artificial polynucleotide of the invention includes, e.g., (a) apolynucleotide comprising a nucleotide sequence encoding an artificialpolypeptide of the invention; (b) a polynucleotide that is complementaryto or that encodes a polynucleotide sequence of (a); (c) a nucleic acidthat hybridizes to a polynucleotide of (a) or (b) under stringentconditions over substantially the entire length of the nucleic acid; (d)a polynucleotide that is at least about 95%, preferably at least about98% identical to a polynucleotide of (a), (b), or (c); and, (e) apolynucleotide comprising a conservative variation of (a), (b), (c), or(d).

Because the glycopolypeptides of the invention provide a variety of newpolypeptide sequences (e.g., comprising an unnatural amino acid thatcomprises an amino acid, where a saccharide moiety can be linked, or anunnatural amino acid that includes a saccharide moiety in the case ofproteins synthesized in the translation systems herein, or, e.g., in thecase of the novel synthetases, novel sequences of standard amino acids),the glycopolypeptides also provide new structural features which can berecognized, e.g., in immunological assays. Thus antibodies and antiserathat are specifically immunoreactive with an artificial polypeptide ofthe invention are also provided. In other words, the generation ofantisera, which specifically bind the polypeptides of the invention, aswell as the polypeptides which are bound by such antisera, are a featureof the invention.

Such antibodies or antisera preferably have minimum, or nocross-reactivity with the wild-type version of the antigen that do notcontain the unnatural amino acids.

Unnatural amino acids are generally described above. Of particularinterest for making glycoproteins of the invention are unnatural aminoacids in which R in Formula I includes a moiety that can react with areactive group that is attached to a saccharide moiety, to link thesaccharide moiety to a protein that includes the unnatural amino acid.Suitable R groups include, for example, keto-, azido-, hydroxyl-,hydrazine, cyano-, halo-, aminooxy-, alkenyl, alkynyl, carbonyl, ether,thiol, seleno-, sulfonyl-, borate, boronate, phospho, phosphono,phosphine, heterocyclic, enone, imine, aldehyde, ester, thioacid,thioester, hindered ester, hydroxylamine, amine, and the like, or anycombination thereof. In some embodiments, the unnatural amino acids havea photoactivatable cross-linker.

In addition to unnatural amino acids that contain novel side chains,unnatural amino acids also optionally comprise modified backbonestructures, e.g., as illustrated by the structures of Formula II andIII:

wherein Z typically comprises OH, NH₂, SH, NH—R′, or S—R′; X and Y,which can be the same or different, typically comprise S or O, and R andR′, which are optionally the same or different, are typically selectedfrom the same list of constituents for the R group described above forthe unnatural amino acids having Formula I as well as hydrogen. Forexample, unnatural amino acids of the invention optionally comprisesubstitutions in the amino or carboxyl group as illustrated by FormulasII and III. Unnatural amino acids of this type include, but are notlimited to, α-hydroxy acids, α-thioacids α-aminothiocarboxylates, e.g.,with side chains corresponding to the common twenty natural amino acidsor unnatural side chains. In addition, substitutions at the α-carbonoptionally include L, D, or α-α-disubstituted amino acids such asD-glutamate, D-alanine, D-methyl-O-tyrosine, aminobutyric acid, and thelike. Other structural alternatives include cyclic amino acids, such asproline analogues as well as 3-, 4-, 6-, 7-, 8-, and 9-membered ringproline analogues, β and γ amino acids such as substituted β-alanine andγ-amino butyric acid.

For example, many unnatural amino acids are based on natural aminoacids, such as tyrosine, glutamine, phenylalanine, and the like.Tyrosine analogs include para-substituted tyrosines, ortho-substitutedtyrosines, and meta substituted tyrosines, wherein the substitutedtyrosine comprises an acetyl group, a benzoyl group, an amino group, ahydrazine, an hydroxyamine, a thiol group, a carboxy group, an isopropylgroup, a methyl group, a C6-C20 straight chain or branched hydrocarbon,a saturated or unsaturated hydrocarbon, an O-methyl group, a polyethergroup, a nitro group, or the like. In addition, multiply substitutedaryl rings are also contemplated. Glutamine analogs of the inventioninclude, but are not limited to, α-hydroxy derivatives, γ-substitutedderivatives, cyclic derivatives, and amide substituted glutaminederivatives. Example phenylalanine analogs include, but are not limitedto, meta-substituted, ortho-substituted, and/or para-substitutedphenylalanines, wherein the substituent comprises a hydroxy group, amethoxy group, a methyl group, an allyl group, an aldehyde or ketogroup, or the like.

Specific examples of unnatural amino acids include, but are not limitedto, p-acetyl-L-phenylalanine, O-methyl-L-tyrosine, anL-3-(2-naphthyl)alanine, a 3-methyl-phenylalanine, anO-4-allyl-L-tyrosine, a 4-propyl-L-tyrosine, atri-O-acetyl-GlcNAcβ-serine, β-O-GlcNAc-L-serine, atri-O-acetyl-GalNAc-α-threonine, an α-GalNAc-L-threonine, an L-Dopa, afluorinated phenylalanine, an isopropyl-L-phenylalanine, ap-azido-L-phenylalanine, a p-acyl-L-phenylalanine, ap-benzoyl-L-phenylalanine, an L-phosphoserine, a phosphonoserine, aphosphonotyrosine, a p-iodo-phenylalanine, a p-bromophenylalanine, ap-amino-L-phenylalanine, an isopropyl-L-phenylalanine, those listedbelow, or elsewhere herein, and the like. The structures of a variety ofunnatural amino acids are provided in, for example, FIGS. 17, 18, 19,26, and 29 of WO 2002/085923 (incorporated herein by reference).

Unnatural amino acids suitable for use in the methods of the inventionalso include those that have a saccharide moiety attached to the aminoacid side chain. In one embodiment, an unnatural amino acid with asaccharide moiety includes a serine or threonine amino acid with a Man,GalNAc, Glc, Fuc, or Gal moiety. Examples of unnatural amino acids thatinclude a saccharide moiety include, but are not limited to, e.g., atri-O-acetyl-GlcNAcβ-serine, a β-O-GlcNAc-L-serine, atri-O-acetyl-GalNAc-α-threonine, an α-GalNAc-L-threonine, anO-Man-L-serine, a tetra-acetyl-O-Man-L-serine, an O-GalNAc-L-serine, atri-acetyl-O-GalNAc-L-serine, a Glc-L-serine, atetraacetyl-Glc-L-serine, a fuc-L-serine, a tri-acetyl-fuc-L-serine, anO-Gal-L-serine, a tetra-acetyl-O-Gal-L-serine, abeta-O-GlcNAc-L-threonine, a tri-acetyl-beta-GlcNAc-L-threonine, anO-Man-L-threonine, a tetra-acetyl-O-Man-L-threonine, anO-GalNAc-L-threonine, a tri-acetyl-O-GalNAc-L-threonine, aGlc-L-threonine, a tetraacetyl-Glc-L-threonine, a fuc-L-threonine, atri-acetyl-fuc-L-threonine, an O-Gal-L-threonine, atetra-acetyl-O-Gal-L-serine, and the like. The invention includesunprotected and acetylated forms of the above. See also WO 03/031464 A2,entitled “Remodeling and Glycoconjugation of Peptides”; and, U.S. Pat.No. 6,331,418, entitled “Saccharide Compositions, Methods and Apparatusfor their synthesis.” (all incorporated herein by reference).

Many of the unnatural amino acids provided above are commerciallyavailable, e.g., from Sigma (USA) or Aldrich (Milwaukee, Wis., USA).Those that are not commercially available are optionally synthesized asprovided in the examples of US 2004/138106 A1 (incorporated herein byreference) or using standard methods known to those of skill in the art.For organic synthesis techniques, see, e.g., Organic Chemistry byFessendon and Fessendon, (1982, Second Edition, Willard Grant Press,Boston Mass.); Advanced Organic Chemistry by March (Third Edition, 1985,Wiley and Sons, New York); and Advanced Organic Chemistry by Carey andSundberg (Third Edition, Parts A and B, 1990, Plenum Press, New York).See also WO 02/085923 for additional synthesis of unnatural amino acids.

For example, meta-substituted phenylalanines are synthesized in aprocedure as outlined in WO 02/085923 (see, e.g., FIG. 14 of thepublication). Typically, NBS (N-bromosuccinimide) is added to ameta-substituted methylbenzene compound to give a meta-substitutedbenzyl bromide, which is then reacted with a malonate compound to givethe meta substituted phenylalanine. Typical substituents used for themeta position include, but are not limited to, ketones, methoxy groups,alkyls, acetyls, and the like. For example, 3-acetyl-phenylalanine ismade by reacting NBS with a solution of 3-methylacetophenone. For moredetails see the examples below. A similar synthesis is used to produce a3-methoxy phenylalanine. The R group on the meta position of the benzylbromide in that case is —OCH₃. See, e.g., Matsoukas et al., J. Med.Chem., 1995, 38, 4660-4669.

In some embodiments, the design of unnatural amino acids is biased byknown information about the active sites of synthetases, e.g.,orthogonal tRNA synthetases used to aminoacylate an orthogonal tRNA. Forexample, three classes of glutamine analogs are provided, includingderivatives substituted at the nitrogen of amide (1), a methyl group atthe γ-position (2), and a N-Cγ-cyclic derivative (3). Based upon thex-ray crystal structure of E. coli GlnRS, in which the key binding siteresidues are homologous to yeast GlnRS, the analogs were designed tocomplement an array of side chain mutations of residues within a 10 Åshell of the side chain of glutamine, e.g., a mutation of the activesite Phe233 to a small hydrophobic amino acid might be complemented byincreased steric bulk at the Cγ position of Gln.

For example, N-phthaloyl-L-glutamic 1,5-anhydride (compound number 4 inFIG. 23 of WO 02/085923) is optionally used to synthesize glutamineanalogs with substituents at the nitrogen of the amide. See, e.g., King& Kidd, A New Synthesis of Glutamine and of γ-Dipeptides of GlutamicAcid from Phthylated Intermediates. J. Chem. Soc., 3315-3319, 1949;Friedman & Chatterrji, Synthesis of Derivatives of Glutamine as ModelSubstrates for Anti-Tumor Agents. J. Am. Chem. Soc. 81, 3750-3752, 1959;Craig et al., Absolute Configuration of the Enantiomers of 7-Chloro-4[[4-(diethylamino)-1-methylbutyl]amino]quinoline (Chloroquine). J. Org.Chem. 53, 1167-1170, 1988; and Azoulay et al., Glutamine analogues asPotential Antimalarials, Eur. J. Med. Chem. 26, 201-5, 1991. Theanhydride is typically prepared from glutamic acid by first protectionof the amine as the phthalimide followed by refluxing in acetic acid.The anhydride is then opened with a number of amines, resulting in arange of substituents at the amide. Deprotection of the phthaloyl groupwith hydrazine affords a free amino acid as shown in FIG. 23 of WO2002/085923.

Substitution at the γ-position is typically accomplished via alkylationof glutamic acid. See, e.g., Koskinen & Rapoport, Synthesis of4-Substituted Prolines as Conformationally Constrained Amino AcidAnalogues. J. Org. Chem. 54, 1859-1866, 1989. A protected amino acid,e.g., as illustrated by compound number 5 in FIG. 24 of WO 02/085923, isoptionally prepared by first alkylation of the amino moiety with9-bromo-9-phenylfluorene (PhflBr) (see, e.g., Christie & Rapoport,Synthesis of Optically Pure Pipecolates from L-Asparagine. Applicationto the Total Synthesis of (+)-Apovincamine through Amino AcidDecarbonylation and Iminium Ion Cyclization. J. Org. Chem. 1989,1859-1866, 1985) and then esterification of the acid moiety usingO-tert-butyl-N,N′-diisopropylisourea. Addition of KN(Si(CH₃)₃)₂regioselectively deprotonates at the α-position of the methyl ester toform the enolate, which is then optionally alkylated with a range ofalkyl iodides. Hydrolysis of the t-butyl ester and Phfl group gave thedesired γ-methyl glutamine analog (Compound number 2 in FIG. 24 of WO02/085923).

An N-Cγ cyclic analog, as illustrated by Compound number 3 in FIG. 25 ofWO 02/085923, is optionally prepared in 4 steps from Boc-Asp-Ot-Bu aspreviously described. See, e.g., Barton et al., Synthesis of Novelα-Amino-Acids and Derivatives Using Radical Chemistry: Synthesis of L-and D-α-Amino-Adipic Acids, L-α-aminopimelic Acid and AppropriateUnsaturated Derivatives. Tetrahedron Lett. 43, 4297-4308, 1987, andSubasinghe et al., Quisqualic acid analogues: synthesis ofbeta-heterocyclic 2-aminopropanoic acid derivatives and their activityat a novel quisqualate-sensitized site. J. Med. Chem. 35 4602-7, 1992.Generation of the anion of the N-t-Boc-pyrrolidinone, pyrrolidinone, oroxazolidone followed by the addition of the compound 7, as shown in FIG.25, results in a Michael addition product. Deprotection with TFA thenresults in the free amino acids.

In addition to the above unnatural amino acids, a library of tyrosineanalogs has also been designed. Based upon the crystal structure of B.stearothermophilus TyrRS, whose active site is highly homologous to thatof the M. jannashii synthetase, residues within a 10 Å shell of thearomatic side chain of tyrosine were mutated (Y32, G34, L65, Q155, D158,A167, Y32 and D158). The library of tyrosine analogs, as shown in FIG.26 of WO 02/085923, has been designed to complement an array ofsubstitutions to these active site amino acids. These include a varietyof phenyl substitution patterns, which offer different hydrophobic andhydrogen-bonding properties. Tyrosine analogs are optionally preparedusing the general strategy illustrated by WO 02/085923 (see, e.g., FIG.27 of the publication). For example, an enolate of diethylacetamidomalonate is optionally generated using sodium ethoxide. Adesired tyrosine analog can then be prepared by adding an appropriatebenzyl bromide followed by hydrolysis.

Many biosynthetic pathways already exist in cells for the production ofamino acids and other compounds. While a biosynthetic method for aparticular unnatural amino acid may not exist in nature, e.g., in E.coli, the invention provide such methods. For example, biosyntheticpathways for unnatural amino acids are optionally generated in E. coliby adding new enzymes or modifying existing E. coli pathways. Additionalnew enzymes are optionally naturally occurring enzymes or artificiallyevolved enzymes. For example, the biosynthesis of p-aminophenylalanine(as presented, e.g., in WO 02/085923) relies on the addition of acombination of known enzymes from other organisms. The genes for theseenzymes can be introduced into a cell, e.g., an E. coli cell, bytransforming the cell with a plasmid comprising the genes. The genes,when expressed in the cell, provide an enzymatic pathway to synthesizethe desired compound. Examples of the types of enzymes that areoptionally added are provided in the examples below. Additional enzymessequences are found, e.g., in Genbank. Artificially evolved enzymes arealso optionally added into a cell in the same manner. In this manner,the cellular machinery and resources of a cell are manipulated toproduce unnatural amino acids.

A variety of methods are available for producing novel enzymes for usein biosynthetic pathways or for evolution of existing pathways. Forexample, recursive recombination, e.g., as developed by Maxygen, Inc.,is optionally used to develop novel enzymes and pathways. See, e.g.,Stemmer 1994, “Rapid evolution of a protein in vitro by DNA shuffling,”Nature 370(4): 389-391; and Stemmer, 1994, “DNA shuffling by randomfragmentation and reassembly: In vitro recombination for molecularevolution,” Proc. Natl. Acad. Sci. USA. 91: 10747-10751. SimilarlyDesignPath™, developed by Genencor is optionally used for metabolicpathway engineering, e.g., to engineer a pathway to create an unnaturalamino acid in E coli. This technology reconstructs existing pathways inhost organisms using a combination of new genes, e.g., identifiedthrough functional genomics, and molecular evolution and design. DiversaCorporation also provides technology for rapidly screening libraries ofgenes and gene pathways, e.g., to create new pathways.

Typically, the biosynthesis methods of the invention, e.g., the pathwayto create p-aminophenylalanine (pAF) from chorismate, do not affect theconcentration of other amino acids produced in the cell. For example apathway used to produce pAF from chorismate produces pAF in the cellwhile the concentrations of other aromatic amino acids typicallyproduced from chorismate are not substantially affected. Typically theunnatural amino acid produced with an engineered biosynthetic pathway ofthe invention is produced in a concentration sufficient for efficientprotein biosynthesis, e.g., a natural cellular amount, but not to such adegree as to affect the concentration of the other amino acids orexhaust cellular resources. Typical concentrations produced in vivo inthis manner are about 10 mM to about 0.05 mM. Once a bacterium istransformed with a plasmid comprising the genes used to produce enzymesdesired for a specific pathway and a twenty-first amino acid, e.g., pAF,dopa, O-methyl-L-tyrosine, or the like, is generated, in vivo selectionsare optionally used to further optimize the production of the unnaturalamino acid for both ribosomal protein synthesis and cell growth.

One protein therapeutics that can benefit from this aspect of theinvention is Genzyme Corporation's (Cambridge, Mass.) Cerezyme®(imiglucerase for injection), which is an enzymatically activerecombinant glucocerebrosidase for treating Gaucher's disease. Gaucher'sdisease is an autosomal recessive lysosomal storage disordercharacterized by a deficiency in a lysosomal enzyme, glucocerebrosidase(“GCR”), which hydrolyzes the glycolipid glucocerebroside. In Gaucher'spatients, deficiency in this enzyme causes the lycolipidglucocerebroside, which arises primarily from degradation ofglucosphingolipids from membranes of white blood cells and senescent redblood cells, to accumulate in large quantities in lysosomes ofphagocytic cells, mainly in the liver, spleen and bone marrow. Clinicalmanifestations of the disease include splenomegaly, hepatomegaly,skeletal disorders, thrombocytopenia and anemia.

Prior treatments for patients suffering from this disease includeadministration of analgesics for relief of bone pain, blood and platelettransfusions, and in severe cases, splenectomy. Joint replacements maybe necessary for patients who experience bone erosion. Brady (NewEngland Journal of Medicine 275: 312, 1966) proposed enzyme replacementtherapy with GCR as a treatment for Gaucher's disease. However, Furbishet al. (Biochem. Biophys. Research Communications 81: 1047, 1978)observed that infused human placental GCR does not reach the site atwhich it is active, namely lysosomes of cells of the reticuloendothelialsystem, but rather is taken up by hepatocytes. Furbish et al. (Biochem.Biophys. Acta 673: 425, 1981) improved delivery of human placental GCRto phagocytic cells by treating the GCR sequentially with neuraminidase,β-galactosidase and β-N-acetylhexosaminidase, and demonstrated that thetreated GCR was taken up more efficiently by rat Kupffer cells thanuntreated protein. Sorge et al. (Proc. Nat'l. Acad. Sci., USA 82: 7289,1985) and Tsuji et al. (J. Biol. Chem. 261: 50, 1986) describe cloningand sequencing of a gene encoding human GCR.

Genzyme Corp. developed and produced in mammalian cell culture (CHO, orChinese Hamster Overy cells) a recombinant analogue of the human enzymeβ-Glucocerebrosidase (β-D-glucosyl-N-acylsphingosine glucohydrolase,E.C. 3.2.1.45), which it calls Cerezyme® (imiglucerase for injection).Purified imiglucerase is a monomeric glycoprotein of 497 amino acids,containing 4 N-linked glycosylation sites (Mr=60,430). Imiglucerasediffers from placental glucocerebrosidase by one amino acid at position495, where histidine is substituted for arginine. The oligosaccharidechains at the glycosylation sites have been modified to terminate inmannose sugars. The modified carbohydrate structures on imiglucerase aresomewhat different from those on placental glucocerebrosidase. Thesemannose-terminated oligosaccharide chains of imiglucerase arespecifically recognized by endocytic carbohydrate receptors onmacrophages, the cells that accumulate lipid in Gaucher disease. SeeU.S. Pat. Nos. 5,236,838 and 5,549,892. In clinical trials, Cerezyme®improved anemia and thrombocytopenia, reduced spleen and liver size, anddecreased cachexia to a degree similar to that observed with Ceredase®(alglucerase injection).

One problem of Cerezyme® (imiglucerase for injection) is its apparentserum half-life. During one-hour intravenous infusions of four doses(7.5, 15, 30, 60 U/kg) of Cerezyme® (imiglucerase for injection),steady-state enzymatic activity was achieved by 30 minutes. However,following infusion, plasma enzymatic activity declined rapidly with ahalf-life ranging from 3.6 to 10.4 minutes. Plasma clearance ranged from9.8 to 20.3 mL/min/kg (mean±S.D., 14.5±4.0 mL/min/kg). The volume ofdistribution corrected for weight ranged from 0.09 to 0.15 L/kg(0.12±0.02 L/kg). These variables do not appear to be influenced by doseor duration of infusion. The pharmacokinetics of Cerezyme®) do notappear to be different from placental-derived alglucerase (Ceredase®).This necessitates the need to administer relatively large amounts ofCerezyme® (imiglucerase for injection) to the patient, especially inlong-term treatment, which can become quite expensive. In fact,Cerezyme® treatment generally requires life-long, intravenous infusionsat least once every 2 weeks, making it inconvenient for most patients,and prohibitively expensive (and therefore unavailable) to patients inpoor countries.

The instant invention can be used to incorporate unnatural amino acid(s)into the recombinant Cerezyme® and increase its half-life withoutsubstantially lose its intended bioactivity, thus significantly reducethe amount of enzymes needed per patient in a given amount of treatmentperiod. This will reduce the cost and/or increase profit margin,resulting in a cheaper, if not better therapeutics that is moreaffordable.

D. Multi-Drug Immunoconjugates

The global market for monoclonal antibody therapeutics reached a totalof $7.2 billion in 2003. The market has been growing at an impressivecompound average annual growth rate of 53% over the previous five years,and is estimated to reach US$26 billion by the end of the decade(average annual growth rate of 18%).

More than 270 industry antibody R&D projects related to cancer therapyhave been identified. Among them, there are almost 100 industry relatedR&D projects utilizing conjugated antibodies as a therapeutic strategy,some are already in different phases of clinical development (seeMonoclonal Antibody Therapeutics: Current Market Dynamics & FutureOutlook, Research and Markets Ltd, 2004; Improved Monoclonals on theRise, Research and Markets Ltd, 2004; Anticancer Monoclonal AntibodyDatabase, Bioportfolio, 2003).

Immunoconjugation may be used to increase the therapeutic efficacies ofantibodies. However, current technologies allow attachment of only asingle type of drug to an antibody. This is primarily due to thelimitations in the scope of chemistries available in the set of naturalamino acids, which do not allow precise control over theimmunoconjugation processes.

Attempts to attach multiple drugs on an antibody using currenttechnologies lead to significant heterogeneity from molecule tomolecule, and inconsistencies from lot to lot. This is far from ideal inthe context of tumor therapies, since the best strategy to treat tumorsis frequently through using cocktails of drugs.

Unnatural amino acids can be used to provide a wide variety of newchemistries to attach drugs site-specifically, thus enabling theprovision of tumor-targeted, multi-drug regimens to cancer patients. Forexample, the instant methods can be used to produce immunoconjugateseither by attaching a single type of drug site-specifically on toantibodies and antibody fragments to overcome issues related toheterogeneity, or by attaching multiple drug-types site-specifically onto antibodies and antibody fragments in a stoichiometrically controlledmanner. In other words, the methods of the instant invention can be usedto design a novel class of immunoconjugates that carry a combination ofdrugs that can be delivered simultaneously and specifically to thetumor, where the therapeutic molecules in the medicament are highlyhomogeneous, with lot to lot consistency. The major advantages of suchimmunoconjugates include:

-   -   Simultaneous targeted delivery of multiple drugs that act        synergistically in killing tumor cells    -   Combining drugs that act in different phases of the cell cycle        to increase the number of cells exposed to cytotoxic effects    -   Focused delivery of the cytotoxic agents to tumor cells        maximizing its antitumor effect    -   Minimized exposure to normal tissue    -   Precise control over drug payloads and drug ratios leading to        homogenous final products

For example, EP0328147B1 describes novel immunoconjugates, methods fortheir production, pharmaceutical compositions and method for deliveringcytotoxic anthracyclines to a selected population of cells desired to beeliminated. More particularly, the invention relates to immunoconjugatescomprised of an antibody reactive with a selected cell population to beeliminated, the antibody having a number of cytotoxic anthracyclinemolecules covalently linked to its structure. Each anthracyclinemolecule is conjugated to the antibody via a linker arm, theanthracycline being bound to that linker via an acid-sensitiveacylhydrazone bond at the 13-keto position of the anthracycline. Apreferred embodiment of the invention relates to an adriamycinimmunoconjugate wherein adriamycin is attached to the linker arm throughan acylhydrazone bond at the 13-keto position. The linker additionallycontains a disulfide or thioether linkage as part of the antibodyattachment to the immunoconjugate. The immunoconjugates and methods ofthe invention are useful in antibody-mediated drug delivery systems forthe preferential killing of a selected cell population in the treatmentof diseases such as cancers and other tumors, non-cytocidal viral orother pathogenic infections, and autoimmune disorders.

In that particular example, the antibody-drug linkage is limited to adisulfide or a thioether bond, which in general will likely lead to theheterogeneity and inconsistency problem described above. And there isfew control, if any, about the attachment of multiple drugs. The instantinvention allows multiple unnatural amino acids with different chemistryto be incorporated at different pre-determined positions of the antibodyor its fragment, thus allowing multiple drug molecules to besite-specifically attached to the immunoconjugate.

Thus the invention provides an immunoconjugate comprising an antibody(or its functional fragment) specific for a target (e.g., a targetcell), the antibody (or fragment or functional equivalent thereof)conjugated, at specific, pre-determined positions, with two or moretherapeutic molecules, wherein each of the positions comprise anunnatural amino acid. In certain embodiments, the antibody fragments areF(ab′)₂, Fab′, Fab, or Fv fragments.

In certain embodiments, the two or more therapeutic molecules are thesame. In certain embodiments, the two or more therapeutic molecules aredifferent. In certain embodiments, the therapeutic molecules areconjugated to the same unnatural amino acids. In certain embodiments,the therapeutic molecules are conjugated to different unnatural aminoacids.

In certain embodiments, the nature or chemistry of the unnatural aminoacid/therapeutic molecule linkage allows cleavage of the linkage undercertain conditions, such as mild or weak acidic conditions (e.g., aboutpH 4-6, preferably about pH5), reductive environment (e.g., the presenceof a reducing agent), or divalent cations, and is optionally acceleratedby heat. See EP0318948A2.

In certain embodiments, the unnatural amino acid(s) and/or thethrapeutic molecule comprises a chemically reactive moiety. The moietymay be strongly electrophilic or nucleophilic and thereby be availablefor reacting directly with the therapeutic molecule or the antibody orfragment thereof. Alternatively, the moiety may be a weaker electrophileor nucleophile and therefore require activation prior to the conjugationwith the therapeutic molecule or the antibody or fragment thereof. Thisalternative would be desirable where it is necessary to delay activationof the chemically reactive moiety until an agent is added to themolecule in order to prevent the reaction of the agent with the moiety.In either scenario, the moiety is chemically reactive, the scenariosdiffer (in the reacting with antibody scenario) by whether followingaddition of an agent, the moiety is reacted directly with an antibody orfragment thereof or is reacted first with one or more chemicals torender the moiety capable of reacting with an antibody or fragmentthereof. In certain embodiments, the chemically reactive moiety includesan amino group, a sulfhydryl group, a hydroxyl group, acarbonyl-containing group, or an alkyl leaving group.

In certain embodiments, the therapeutic molecule is conjugated to theantibody through a linker/spacer (e.g., one or more repeats of methylene(—CH₂—), methyleneoxy (—CH₂—O—), methylenecarbonyl (—CH₂—CO—), aminoacids, or combinations thereof).

Therapeutic molecules may include drugs, toxins (e.g., icin, abrin,diptheria toxin, and Pseudomonas exotoxin A), biological responsemodifiers, radiodiagnostic compounds, radiotherapeutic compounds, andderivatives or combinations thereof.

The invention also provides the use of the subject translation systems,host cells, and methods for generating such immunoconjugates.

E. Multiprotein Complexes

Unnatural amino acids can also be used to join two or more proteins orprotein sub-units with unique functionalities. For example, bispecificantibodies may be generated by linking two antibodies (or functionalparts thereof or derivatives thereof, such as Fab, Fab′, Fd, Fv, scFvfragments, etc.) through unnatural amino acids incorporated therein.

Although the electrophilic moiety (e.g., a keto moiety, an aldehydemoiety, and/or the like) and nucleophilic moiety described above insubsection C are introduced in the context of attaching sugar moietiesto proteins, the same set of electrophilic and nucleophilic moieties maybe used to join two protein molecules, such as two antibody molecules.

Thus the instant invention provides methods for synthesis ofmulti-protein conjugates. These methods involve, in some embodiments,incorporating into a first protein (e.g., a first antibody) a firstunnatural amino acid that comprises a first reactive group; andcontacting the first protein with a second protein (e.g., a secondantibody) comprising a second unnatural amino acid that comprises asecond reactive group, wherein the first reactive group reacts with thesecond reactive group, thereby forming a covalent bond that attaches thesecond protein to the first protein.

The first reactive group is, in some embodiments, an electrophilicmoiety (e.g., a keto moiety, an aldehyde moiety, and/or the like), andthe second reactive group is a nucleophilic moiety. In some embodiments,the first reactive group is a nucleophilic moiety and the secondreactive group is an electrophilic moiety (e.g., a keto moiety, analdehyde moiety, and/or the like). For example, an electrophilic moietyis attached to the unnatural amino acid of the first Ab, and thenucleophilic moiety is attached to the unnatural amino acid of thesecond Ab.

Different functional domains of different proteins may be linkedtogether through similar fashion to create novel proteins with novelfunctions (e.g., novel transcription factors with unique combination ofDNA binding and transcription activation domains; novel enzymes withnovel regulatory domains, etc.).

F. pH-Sensitive Binding

Many protein interactions are pH-sensitive, in the sense that bindingaffinity of one protein for its usual binding partner may change asenvironmental pH changes. For example, many ligands (such as insulin,interferons, growth hormone, etc.) bind their respective cell-surfacereceptors to elicit signal transduction. The ligand-receptor complexwill then be internalized by receptor-mediated endocytosis, and gothrough a successive series of more and more acidic endosomes.Eventually, the ligand-receptor interaction is weakened at a certainacidic pH (e.g., about pH 5.0), and the ligand dissociates from thereceptor. Some receptors (and perhaps some ligands) may be recycled backto cell surface. There, they may be able to bind their respective normalbinding partners.

If the pH-sensitive binding can be modulated such that theligand-receptor complex can be dissociated at a relatively higher pH,then certain ligands may be dissociated earlier from their receptors,and become preferentially recycled to cell surface rather than bedegraded. This will result in an increased in vivo half-life of suchligands, which might be desirable since less insulin may be needed forthe same (or better) efficacy in diabete patients.

In other situations, it might be desirable to modulate the pH-sensitivebinding by favoring binding at a lower pH.

For example, monoclonal antibodies are generally very specific for theirtargets. However, in many applications, such as in cancer therapy, theytend to elicit certain side effects by, for example, binding tonon-tumor tissues. One reason could be that the tumor targets againstwhich monoclonal antibodies are raised are not specifically expressed ontumor cells, but are also expressed (although may be in smaller numbers)on some healthy cells. Such side effects are generally undesirable, andthere is a need for antibodies with an improved specificity.

The pH of human blood is highly regulated and maintained in the range ofabout 7.6-7.8. On the other hand, tumor cells have an extracellular pHof 6.3-6.5, due to the accumulation of metabolic acids that areinefficiently cleared because of poor tumor vascularization. If theinteraction between a tumor antigen and its therapeutic antibody can bemodulated such that at low pH, the binding is favored, thetumor-antibody may have an added specificity/affinity/selectivity forthose tumor antigens, even though the same tumor antigens are alsooccasionally found on normal tissues.

In fact, such modified antibodies may be desirable not only for cancertherapy, but also desirable for any antigen-antibody binding that mayoccur at a lower-than-normal level of pH.

Certainly, in the tumor antibody case, differences other thanpH-sensitive binding in the extracellular region outside a tumor mayalso be explored to enhance tumor-specific binding. Such differences mayinclude hypoxia condition and/or differences in the enzymes present inthe extracellular environment of tumors relative to healthy tissues.

Tumor Hypoxia. Due to the increased metabolic needs of tumor cells andthe fact that tumor growth exceeds that of its supporting vasculature,oxygen is often in short supply in or around tumor tissues. This leadsto tumor hypoxia. Certain enzymes are expressed during hypoxia, whichcharacteristics have been exploited to convert cancer prodrugs intoactive agents.

Tumor-Specific Extracellular Enzymes. Some tumor-specific enzymes thataccumulate in the local extracellular tumor environment can also beinvestigated as prodrug activators.

While it has been known that there are differences in themicro-environment of tumors and non-tumor tissues, such differences havenot been used to design and prepare antitumor antibodies with improvedspecificity.

The co-pending U.S. Ser. No. 11/094,625, filed on Mar. 30, 2005,describes methods, systems and reagents for regulating pH-sensitiveprotein interaction by incorporating non-natural amino acids into theprotein (e.g. an antibody, or its functional fragment, derivative,etc.). The application also discloses specific uses in regulatingpH-sensitive binding of antibodies to tumor site, by conferring enhancedtumor-specificity/selectivity. In that embodiment, the non-natural aminoacids preferably have desirable side-chain pKa's, such that at belowphysiological pH (e.g. about pH 6.3-6.5) the non-natural amino acidconfer enhanced binding to tumor antigens in acidic environments. Suchnon-natural amino acids can be incorporated by the subject methods andsystems. The entire content of U.S. Ser. No. 11/094,625 is incorporatedherein by reference.

G. Coupling of Proteins to Protein Arrays

One key technology that can enable high throughput, highly parallelanalysis of polypeptides is the protein array (also called amicroarray). A protein microarray typically consists of manypolypeptides, each of which is attached to a solid support. Thepolypeptides in the microarray can be contacted with other molecules todetermine, for example, whether the molecule binds to or otherwiseinteracts with one or more of the polypeptides in the array. Thus, it isdesirable that each polypeptide in an array be attached to the solidsupport in a consistent orientation. Attachment of every polypeptide inthe array at or near its amino terminus or its carboxyl terminus, forexample, can help ensure that the active site or sites of eachpolypeptide are accessible to potentially interacting molecules.Moreover, the attachment of the polypeptide should not disrupt theconformation of the polypeptide, particularly if one desires to detectan activity of the immobilized polypeptides. Thus, a need exists forimproved protein arrays, and methods for their preparation. The presentinvention fulfills these and other needs.

The instant invention provides systems and methods to produce proteinarrays, which are arrays of polypeptides on solid supports. The methodsand systems of the invention allow one to couple a polypeptide to asolid support in such a manner as to preserve the function of thepolypeptides. The covalent or non-covalent attachment generally does notsubstantially affect the structure, function, or biological activity ofthe polypeptide. The polypeptides that are used in the arrays of theinvention incorporate at least one unnatural amino acid, and where theside chain of the amino acid has a reactive group that can be used tocouple the polypeptide to any suitable solid support. The arrays finduse in a wide variety of applications.

The invention provides protein arrays where a polypeptide is attached toa solid support, and where the polypeptide incorporates at least oneunnatural amino acid and the polypeptide is attached to the solidsupport by a chemical linkage that is formed from the reaction productbetween a first reactive group that is on the side chain of theunnatural amino acid and a second reactive group that is attached to asolid support. In this array, the first reactive group can be anelectrophile, e.g., a keto or an aldehyde moiety and the second reactivegroup can be a nucleophilic moiety. Alternatively, the first reactivegroup can be a nucleophilic moiety and the second reactive group can bean electrophile, a keto or an aldehyde moiety.

A wide variety of suitable reactive groups are well known to those ofskill in the art. Such suitable reactive groups can include but are notlimited to, for example, amino, hydroxyl, carboxyl, carboxylate,aldehyde, ester, ether (e.g. thio-ether), amide, amine, nitrile, vinyl,sulfide, sulfonyl, phosphoryl, or similarly chemically reactive groups.Additional suitable reactive groups include, but are not limited to,maleimide, N hydroxysuccinimide, sulfo-N-hydroxysuccinimide,nitfilotriacetic acid, activated hydroxyl, haloacetyl (e.g.,bromoacetyl, iodoacetyl), activated carboxyl, hydrazide, epoxy,aziridine, sulfonylchloride, trifluoromethyldiaziridine,pyridyldisulfide, N-acyl-imidazole, imidazolecarbamate, vinylsulfone,succinimidylcarbonate, arylazide, anhydride, diazoacetate, benzophenone,isothiocyanate, isocyanate, imidoester, fluorobenzene.

In some embodiments, one of the reactive groups is an electrophilicmoiety, and the second reactive group is a nucleophilic moiety. Eitherthe nucleophilic moiety or the electrophilic moiety can be attached tothe side chain of the unnatural amino acid. That reactive group is thenused in a reaction that couples the polypeptide to the solid support.

Suitable electrophilic moieties that react with nucleophilic moieties toform a covalent bond are known to those of skill in the art. Suchelectrophilic moieties include, but are not limited to, e.g., carbonylgroup, a sulfonyl group, an aldehyde group, a ketone group, a hinderedester group, a thioester group, a stable imine group, an epoxide group,an aziridine group, etc.

The nucleophilic moiety used in the reactive group can be any suitablenucleophile, including but not limited to: aliphatic or aromatic amines,such as ethylenediamine, —NR′—NH₂ (hydrazide), —NR′(C═O)NR2NH₂(semicarbazide), —NR′(C═S)NR2NH₂ (thiosemicarbazide), —(C═O)NR1NH₂(carbonylhydrazide), —(C═S)NR′NH₂ (thiocarbonylhydrazide), —(SO₂)NR′NH₂(sulfonylhydrazide), —NR1NR2 (C═O)NR′NH₂ (carbazide), —NR1NR2(C═S)NR′NH₂(thiocarbazide), and —NH₂ (hydroxylamine), where each R1, R2, and R3 isindependently H, or alkyl having 1-6 carbons, preferably H. In general,hydrazides, hydroxylamines, semicarbazides, sulfonylhydrazide, andcarbonylhydrazides are all suitable nucleophilic moieties.

The reaction product of the nucleophile and the electrophile can be anoxime, an amide, a hydrazone, a carbohydrazone, a thiocarbohydrazone, asulfonylhydrazone, a semicarbazone or a thiosemicarbazone. In someembodiments, the reaction product is a reduced hydrazone.

In some embodiments, one or more of the attached polypeptides on theprotein array is at least about 6-50 amino acids in length, and in otherembodiments, one or more of the attached polypeptides is at least about50-100 amino acids or more in length. More specifically, at least about50% of the attached polypeptides can be at least about 6-50 amino acidsin length, or at least about 50% of the attached polypeptides are atleast about 50-100 amino acids in length. In other embodiments, at leastone of the attached polypeptides is a full-length polypeptide, while inother embodiments, at least one of the attached polypeptides is afragment or portion of a full-length polypeptide.

The solid support used in the protein arrays can be any composition orformat, without limitation. In one embodiment, the array is a logicalarray. In other embodiments, the protein array uses a microwell plate.In still other embodiments, the solid support used in the array is abead to which is attached the polypeptide.

In some embodiments, the protein arrays of the invention have aplurality of different polypeptides. For example, a protein array canhave at least about 10 different polypeptides, at least about 100different polypeptides, or at least about 1000 different polypeptides.

In some embodiments, the polypeptides on the array carry modificationsfrom posttranslational processing. These modifications can include, butare not limited to, glycosylation, phosphorylation, acetylation,methylation, myristoylation, prenylation, or proteolytic processing. Inother embodiments, a polypeptide on the protein array is homologous to anative polypeptide.

It is not intended that the source of the polypeptide with the unnaturalamino acid used on the protein array be particularly limited. Thepolypeptide can be produced in vivo, or can be produced synthetically.In one particular embodiment, the polypeptide with at least oneunnatural amino acid is produced using a translation system that uses anucleotide sequence with a degenerate codon, an orthogonal tRNA with ananticodon loop complementary to the degenerate codon (in Watson-Crickbase pair), and an aminoacyl tRNA synthetase that preferentiallyaminoacylates the tRNA with an unnatural amino acid, and where theunnatural amino acid is incorporated into the polypeptide at the site ofthe degenerate codon.

In other embodiments, the invention provides methods for attaching thepolypeptide to the solid support, thereby producing the protein array.In one aspect, the invention provides a method for attaching at leastone polypeptide to a solid support, where the method uses the steps ofincorporating into the polypeptide at least one unnatural amino acidthat has a first reactive group and then reacting the first reactivegroup with a second reactive group that is attached to a solid support,thereby forming a covalent bond and attaching the polypeptide to thesolid support. In this method, the first reactive group can be anelectrophile, e.g., a keto or an aldehyde moiety and the second reactivegroup can be a nucleophilic moiety; or alternatively, the first reactivegroup can be a nucleophilic moiety and the second reactive group can bean electrophile, e.g., a keto or an aldehyde moiety. In a variation ofthis method, the first reactive group, the second reactive group, orboth can comprise a chemically protected moiety, and the method canfurther incorporate a deprotecting step prior to the reacting step. Theprotection/deprotection system can be a photolabile system (e.g.,photodeprotection).

The polypeptides used in this method can be produced in an in vivotranslation system, or produced synthetically. The polypeptide can besubject to posttranslational processing, including but not limited to,glycosylation, phosphorylation, acetylation, methylation,myristoylation, prenylation, or proteolytic processing. The polypeptideused in the method can be a full-length polypeptide, or alternatively,can be a fragment or portion of a full-length polypeptide.

In the methods for attaching the polypeptide to the solid support, anysuitable nucleophile reactive group can be used. Suitable nucleophilesinclude —NR1-NH2 (hydrazide), —NR′(C═O)NR2NH₂ (semicarbazide),—NRI(C═S)NR2NH₂ (thiosemicarbazide), —(C═O)NR′NH₂ (carbonylhydrazide),—(C═S)NR1NH₂ (thiocarbonylhydrazide), —(SO₂)NR′NH₂ (sulfonylhydrazide),—NRINR2(C═O)NR′NH₂ (carbazide), —NR′NR2(C═S)NR3NH₂ (thiocarbazide), and—NH₂ (hydroxylamine), where each R1, R2, and R3 is independently H, oralkyl having 1-6 carbons. The nucleophilic moiety can include anysuitable nucleophile, e.g., hydrazide, hydroxylamine, semicarbazide, orcarbonylhydrazide. In some methods, the second reactive group includes alinker that is attached to the solid support. That linker can beattached to the solid support after the first reactive group is reactedwith the second reactive group. In other embodiments, the first reactivegroup includes a linker that is attached to the polypeptide.

In the methods for attaching the polypeptide to the solid support, anysuitable solid support of any composition or format without limitationcan be used. In one embodiment, the solid support that forms the arrayforms a logical array. In other embodiments, the solid supports makesuse of a microwell plate. In still other embodiments, the solid supportused in the array is a bead to which is attached the polypeptide.

In the methods for attaching the polypeptide to the solid support, aplurality of polypeptides can be optionally attached to the solidsupport. In this case, each of the polypeptides is attached to adiscrete region of the solid support to form a protein array. It is notintended that the size of the polypeptides used in these methods belimited (supra).

The invention also provides biosensors that use protein arrays asdescribed above. In one embodiment, the invention provides a biosensorthat uses a polypeptide attached to a solid support by a chemicallinkage that results from the reaction product between a first reactivegroup that is on a side chain of an unnatural amino acid incorporatedinto the polypeptide and a second reactive group that is attached to thesolid support. In one embodiment, the polypeptide used in the biosensoris an antibody.

The invention provides methods for making a protein array, where theattachment between the polypeptide and the solid support is not limitedto covalent linkages.

This method uses the steps of providing a solid support that has one ormore binding or reactive moiety, providing a polypeptide of interestthat incorporates one or more unnatural amino acids, and contacting thepolypeptide of interest to the binding or reactive moiety, where thebinding or reactive moiety binds to or reacts with the polypeptide ofinterest. In one embodiment of this method, the unnatural amino acidreacts with the reactive moiety to bind the protein of interest to thesolid support. In another embodiment, the unnatural amino acid is boundto or uses a linker that binds to the binding moiety to bind the proteinof interest to the solid support. For example, the linker can include abiotin and the binding moiety can incorporate avidin.

The invention also provides protein arrays that do not rely on covalentlinkages to provide the attachment between the polypeptide and the solidsupport. These arrays incorporate a polypeptide attached to a solidsupport, wherein the polypeptide incorporates at least one unnaturalamino acid and the polypeptide is attached to the solid support by alinkage that uses a non-covalent interaction between a chemical moietyon the side chain of the unnatural amino acid and a second chemicalmoiety that is attached to a solid support. The non-covalent interactioncan be an ionic interaction or a van der Waals interaction. For example,unnatural amino acid side chains with suitable acidic groups will formstrong associations with solid supports carrying hydroxyl or othernegatively charged groups. In other variations of this system, othertypes of moieties having a strong affinity for each other can beincorporated into the reactive groups on the unnatural amino acid sidechains and the solid support. For example, an unnatural amino acid sidechain can be coupled with biotin through a suitable reactive group,while the solid support can be coated with avidin, resulting in anextremely strong non-covalent binding between the polypeptide containingthe unnatural amino acid and the solid support.

Another example of a non-covalent interaction between the polypeptideand the solid phase that finds particular use with the invention is theuse of specific antibodies. In this embodiment, an antibody can beraised against an unnatural amino acid side chain. If that unnaturalamino acid is incorporated into a polypeptide, and that antibody isaffixed to a solid phase, e.g., in a microwell plate array, the antibodythen serves as an amino acid specific tether to bind the polypeptide tothe solid phase.

The invention also provides a method for attaching at least onepolypeptide to a solid support, where the method includes incorporatinginto the polypeptide at least one unnatural amino acid having a sidechain with a first chemical moiety, providing a solid support with asecond chemical moiety, providing a linker, where the linker has a thirdand fourth chemical moieties, and combining the polypeptide, the linker,and the solid support under conditions whereby the first chemical moietyon the polypeptide attaches to the third chemical moiety on the linkerand the second chemical moiety on the solid support attaches to thefourth chemical moiety on the linker, thereby forming a bridge betweenthe polypeptide and the solid support and attaching the polypeptide tothe solid support.

In some embodiments of this method, the linker is reacted with thepolypeptide prior to reaction with the solid support, or alternatively,is reacted with the solid support prior to reaction with thepolypeptide. The attachment between the first chemical moiety on thepolypeptide and the third chemical moiety on the linker can be covalentor non-covalent. In the case where the attachment between the first andthird chemical moieties is non-covalent, cognate moieties, such asavidin and biotin can be use for coupling.

In other embodiments, the attachment between the second chemical moietyon the solid support and the fourth chemical moiety on the linker can becovalent or noncovalent. In the case where it is non-covalent, anavidin-biotin-coupling can be used.

As used herein in this aspect of the invention, the term “solid support”refers to a matrix of material in a substantially fixed arrangement thatcan be functionalized to allow synthesis, attachment or immobilizationof polypeptides, either directly or indirectly. The term “solid support”also encompasses terms such as “resin” or “solid phase.” A solid supportmay be composed of polymers, e.g., organic polymers such as polystyrene,polyethylene, polypropylene, polyfluoroethylene, polyethyleneoxy, andpolyacrylamide, as well as copolymers and grafts thereof. A solidsupport may also be inorganic, such as glass, silica, silicon,controlled-pore-glass (CPG), reverse-phase silica, or any suitablemetal. In addition to those described herein, it is also intended thatthe term “solid support” include any solid support that has received anytype of coating or any other type of secondary treatment, e.g.Langmuir-Blodgett films, self-assembled monolayers; (SAM), sol-gel, orthe like.

As used herein, “array” or “microarray” is an arrangement of elements(e.g., polypeptides), e.g., present on a solid support and/or in anarrangement of vessels. While arrays are most often thought of asphysical elements with a specified spatial-physical relationship, thepresent invention can also make use of “logical” arrays, which do nothave a straightforward spatial organization. For example, a computersystem can be used to track the location of one or several components ofinterest that are located in or on physically disparate components. Thecomputer system creates a logical array by providing a “lookup” table ofthe physical location of array members. Thus, even components in motioncan be part of a logical array, as long as the members of the array canbe specified and located. This is relevant, e.g., where the array of theinvention is present in a flowing microscale system, or when it ispresent in one or more microtiter trays.

Certain array formats are sometimes referred to as a “chip” or“biochip.” An array can comprise a low-density number of addressablelocations, e.g., 2 to about 10, 10 medium-density, e.g., about a hundredor more locations, or a high-density number, e.g., a thousand or more.Typically, the chip array format is a geometrically-regular shape thatallows for facilitated fabrication, handling, placement, stacking,reagent introduction, detection, and storage. It can, however, beirregular. In one typical format, an array is configured in a row andcolumn format, with regular spacing between each location of member setson the array. Alternatively, the locations can be bundled, mixed, orhomogeneously blended for equalized treatment or sampling. An array cancomprise a plurality of addressable locations configured so that eachlocation is spatially addressable for high-throughput handling, roboticdelivery, masking, or sampling of reagents. An array can also beconfigured to facilitate detection or quantitation by any particularmeans, including but not limited to, scanning by laser illumination,confocal or deflective light gathering, CCD detection, and chemicalluminescence. “Array” formats, as recited herein, include but are notlimited to, arrays (i.e., an array of a multiplicity of chips),microchips, microarrays, a microarray assembled on a single chip, arraysof biomolecules attached to microwell plates, or any other appropriateformat for use with a system of interest.

VIII. General Techniques

General texts which describe molecular biological techniques, which areapplicable to the present invention, such as cloning, mutation, cellculture and the like, include Berger and Kimmel, Guide to MolecularCloning Techniques, Methods in Enzymology volume 152 Academic Press,Inc., San Diego, Calif. (Berger); Sambrook et al., Molecular Cloning—ALaboratory Manual (3rd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory,Cold Spring Harbor, N.Y., 2000 (“Sambrook”) and Current Protocols inMolecular Biology, F. M. Ausubel et al., eds., Current Protocols, ajoint venture between Greene Publishing Associates, Inc. and John Wiley& Sons, Inc., (supplemented through 2002) (“Ausubel”)). These textsdescribe mutagenesis, the use of vectors, promoters and many otherrelevant topics related to, e.g., the generation of orthogonal tRNA,orthogonal synthetases, and pairs thereof.

Various types of mutagenesis are used in the present invention, e.g., toproduce novel sythetases or tRNAs. They include but are not limited tosite-directed, random point mutagenesis, homologous recombination (DNAshuffling), mutagenesis using uracil containing templates,oligonucleotide-directed mutagenesis, phosphorothioate-modified DNAmutagenesis, mutagenesis using gapped duplex DNA or the like. Additionalsuitable methods include point mismatch repair, mutagenesis usingrepair-deficient host strains, restriction-selection andrestriction-purification, deletion mutagenesis, mutagenesis by totalgene synthesis, double-strand break repair, and the like. Mutagenesis,e.g., involving chimeric constructs, are also included in the presentinvention. In one embodiment, mutagenesis can be guided by knowninformation of the naturally occurring molecule or altered or mutatednaturally occurring molecule, e.g., sequence, sequence comparisons,physical properties, crystal structure or the like.

The above texts and examples found herein describe these procedures aswell as the following publications and references cited within: Sieber,et al., Nature Biotechnology, 19:456-460 (2001); Ling et al., Approachesto DNA mutagenesis: an overview, Anal Biochem. 254(2): 157-178 (1997);Dale et al., Oligonucleotide-directed random mutagenesis using thephosphorothioate method, Methods Mol. Biol. 57:369-374 (1996); I. A.Lorimer, I. Pastan, Nucleic Acids Res. 23, 3067-8 (1995); W. P. C.Stemmer, Nature 370, 389-91 (1994); Arnold, Protein engineering forunusual environments, Current Opinion in Biotechnology 4:450-455 (1993);Bass et al., Mutant Trp repressors with new DNA-binding specificities,Science 242:240-245 (1988); Fritz et al., Oligonucleotide-directedconstruction of mutations: a gapped duplex DNA procedure withoutenzymatic reactions in vitro, Nucl. Acids Res. 16: 6987-6999 (1988);Kramer et al., Improved enzymatic in vitro reactions in the gappedduplex DNA approach to oligonucleotide-directed construction ofmutations, Nucl. Acids Res. 16: 7207 (1988); Sakamar and Khorana, Totalsynthesis and expression of a gene for the a-subunit of bovine rod outersegment guanine nucleotide-binding protein (transducin), Nucl. AcidsRes. 14: 6361-6372 (1988); Sayers et al., Y-T Exonucleases inphosphorothioate-based oligonucleotide-directed mutagenesis, Nucl. AcidsRes. 16:791-802 (1988); Sayers et al., Strand specific cleavage ofphosphorothioate-containing DNA by reaction with restrictionendonucleases in the presence of ethidium bromide, (1988) Nucl. AcidsRes. 16: 803-814; Carter, Improved oligonucleotide-directed mutagenesisusing M13 vectors, Methods in Enzymol. 154: 382-403 (1987); Kramer &Fritz Oligonucleotide-directed construction of mutations via gappedduplex DNA, Methods in Enzymol. 154:350-367 (1987); Kunkel, Theefficiency of oligonucleotide directed mutagenesis, in Nucleic Acids &Molecular Biology (Eckstein, F. and Lilley, D. M. J. eds., SpringerVerlag, Berlin)) (1987); Kunkel et al., Rapid and efficientsite-specific mutagenesis without phenotypic selection, Methods inEnzymol. 154, 367-382 (1987); Zoller & Smith, Oligonucleotide-directedmutagenesis: a simple method using two oligonucleotide primers and asingle-stranded DNA template, Methods in Enzymol. 154:329-350 (1987);Carter, Site-directed mutagenesis, Biochem. J. 237:1-7 (1986);Eghtedarzadeh & Henikoff, Use of oligonucleotides to generate largedeletions, Nucl. Acids Res. 14: 5115 (1986); Mandecki,Oligonucleotide-directed double-strand break repair in plasmids ofEscherichia coli: a method for site-specific mutagenesis, Proc. Natl.Acad. Sci. USA, 83:7177-7181 (1986); Nakamaye & Eckstein, Inhibition ofrestriction endonuclease Nci I cleavage by phosphorothioate groups andits application to oligonucleotide-directed mutagenesis, Nucl. AcidsRes. 14: 9679-9698 (1986); Wells et al., Importance of hydrogen-bondformation in stabilizing the transition state of subtilisin, Phil.Trans. R. Soc. Lond. A 317: 415-423 (1986); Botstein & Shortie,Strategies and applications of in vitro mutagenesis, Science229:1193-1201 (1985); Carter et al., Improved oligonucleotidesite-directed mutagenesis using M13 vectors, Nucl. Acids Res. 13:4431-4443 (1985); Grundstrom et al., Oligonucleotide-directedmutagenesis by microscale ‘shot-gun’ gene synthesis, Nucl. Acids Res.13: 3305-3316 (1985); Kunkel, Rapid and efficient site-specificmutagenesis without phenotypic selection, Proc. Natl. Acad. Sci. USA82:488-492 (1985); Smith, In vitro mutagenesis, Ann. Rev. Genet.19:423-462 (1985); Taylor et al., The use of phosphorothioate-modifiedDNA in restriction enzyme reactions to prepare nicked DNA, Nucl. AcidsRes. 13: 8749-8764 (1985); Taylor et al., The rapid generation ofoligonucleotide-directed mutations at high frequency usingphosphorothioate-modified DNA, Nucl. Acids Res. 13: 8765-8787 (1985);Wells et al., Cassette mutagenesis: an efficient method for generationof multiple mutations at defined sites, Gene 34:315-323 (1985); Krameret al., The gapped duplex DNA approach to oligonucleotide-directedmutation construction, Nucl. Acids Res. 12: 9441-9456 (1984); Kramer etal., Point Mismatch Repair, Cell 38:879-887 (1984); Nambiar et al.,Total synthesis and cloning of a gene coding for the ribonuclease Sprotein, Science 223: 1299-1301 (1984); Zoller & Smith,Oligonucleotide-directed mutagenesis of DNA fragments cloned into M13vectors, Methods in Enzymol. 100:468-500 (1983); and Zoller & Smith,Oligonucleotide-directed mutagenesis using M13-derived vectors: anefficient and general procedure for the production of point mutations inany DNA fragment, Nucleic Acids Res. 10:6487-6500 (1982). Additionaldetails on many of the above methods can be found in Methods inEnzymology Volume 154, which also describes useful controls fortrouble-shooting problems with various mutagenesis methods.

Oligonucleotides, e.g., for use in mutagenesis of the present invention,e.g., mutating libraries of synthetases, or altering tRNAs, aretypically synthesized chemically according to the solid phasephosphoramidite triester method described by Beaucage and Caruthers,Tetrahedron Letts. 22(20):1859-1862, (1981) e.g., using an automatedsynthesizer, as described in Needham-VanDevanter et al., Nucleic AcidsRes., 12:6159-6168 (1984).

In addition, essentially any nucleic acid can be custom or standardordered from any of a variety of commercial sources, such as The MidlandCertified Reagent Company, The Great American Gene Company, ExpressGenInc., Operon Technologies Inc. (Alameda, Calif.) and many others.

The present invention also relates to host cells and organisms for thein vivo incorporation of an unnatural amino acid via orthogonal tRNA/RSpairs. Host cells are genetically engineered (e.g., transformed,transduced or transfected) with the vectors of this invention, which canbe, for example, a cloning vector or an expression vector. The vectorcan be, for example, in the form of a plasmid, a bacterium, a virus, anaked polynucleotide, or a conjugated polynucleotide. The vectors areintroduced into cells and/or microorganisms by standard methodsincluding electroporation (From et al., Proc. Natl. Acad. Sci. USA 82,5824 (1985), infection by viral vectors, high velocity ballisticpenetration by small particles with the nucleic acid either within thematrix of small beads or particles, or on the surface (Klein et al.,Nature 327, 70-73 (1987)). Berger, Sambrook, and Ausubel provide avariety of appropriate transformation methods.

The engineered host cells can be cultured in conventional nutrient mediamodified as appropriate for such activities as, for example, screeningsteps, activating promoters or selecting transformants. These cells canoptionally be cultured into transgenic organisms.

Other useful references, e.g. for cell isolation and culture (e.g., forsubsequent nucleic acid isolation) include Freshney (1994) Culture ofAnimal Cells, a Manual of Basic Technique, third edition, Wiley-Liss,New York and the references cited therein; Payne et al. (1992) PlantCell and Tissue Culture in Liquid Systems John Wiley & Sons, Inc. NewYork, N.Y.; Gamborg and Phillips (eds.) (1995) Plant Cell, Tissue andOrgan Culture; Fundamental Methods Springer Lab Manual, Springer-Verlag(Berlin Heidelberg New York) and Atlas and Parks (eds.) The Handbook ofMicrobiological Media (1993) CRC Press, Boca Raton, Fla.

Several well-known methods of introducing target nucleic acids intobacterial cells are available, any of which can be used in the presentinvention. These include: fusion of the recipient cells with bacterialprotoplasts containing the DNA, electroporation, projectile bombardment,and infection with viral vectors, etc. Bacterial cells can be used toamplify the number of plasmids containing DNA constructs of thisinvention. The bacteria are grown to log phase and the plasmids withinthe bacteria can be isolated by a variety of methods known in the art(see, for instance, Sambrook). In addition, a plethora of kits arecommercially available for the purification of plasmids from bacteria,(see, e.g., EasyPrep™, FlexiPrep™, both from Pharmacia Biotech;StrataClean™, from Stratagene; and, QIAprep™ from Qiagen). The isolatedand purified plasmids are then further manipulated to produce otherplasmids, used to transfect cells or incorporated into related vectorsto infect organisms. Typical vectors contain transcription andtranslation terminators, transcription and translation initiationsequences, and promoters useful for regulation of the expression of theparticular target nucleic acid. The vectors optionally comprise genericexpression cassettes containing at least one independent terminatorsequence, sequences permitting replication of the cassette ineukaryotes, or prokaryotes, or both, (e.g., shuttle vectors) andselection markers for both prokaryotic and eukaryotic systems. Vectorsare suitable for replication and integration in prokaryotes, eukaryotes,or preferably both. See, Giliman & Smith, Gene 8:81 (1979); Roberts, etal., Nature, 328:731 (1987); Schneider, B., et al., Protein Expr. Purif.6435:10 (1995); Ausubel, Sambrook, Berger (all supra). A catalogue ofBacteria and Bacteriophages useful for cloning is provided, e.g., by theATCC, e.g., The ATCC Catalogue of Bacteria and Bacteriophage (1992)Gherna et al. (eds.) published by the ATCC. Additional basic proceduresfor sequencing, cloning and other aspects of molecular biology andunderlying theoretical considerations are also found in Watson et al.(1992) Recombinant DNA Second Edition Scientific American Books, NY.

EXAMPLES

This invention is further illustrated by the following examples whichshould not be construed as limiting. The teachings of all references,patents and published patent applications cited throughout thisapplication, as well as the Figures are hereby incorporated byreference.

Example I tRNA and Synthetase Construction

This example illustrates the incorporation of an amino acid analog inproteins at positions encoded by codons which normally encodephenylalanine (Phe). A schematic diagram is shown in FIG. 1. Similarapproaches can be used for any other analogs.

Phe is encoded by two codons, UUC and UUU. Both codons are read by asingle tRNA, which is equipped with the anticodon sequence GAA. The UUCcodon is therefore recognized through standard Watson-Crick base-pairingbetween codon and anticodon; UUU is read through a G-U wobble base-pairat the first position of the anticodon (Crick, J. Mol. Biol. 19: 548,1966; Soll and RajBhandary, J. Mol. Biol. 29: 113, 1967). Thermaldenaturation of RNA duplexes has yielded estimates of the Gibbs freeenergies of melting of G-U, G-C, A-U, and A-C basepairs as 4.1, 6.5,6.3, and 2.6 kcal/mol, respectively, at 37° C. Thus the wobble basepair,G-U, is less stable than the Watson-Crick basepair, A-U. A modifiedtRNA^(Phe) outfitted with the AAA anticodon (tRNA^(Phe) _(AAA)) wasengineered to read the UUU codon, and was predicted to read such codonsfaster than wild-type tRNA^(Phe) _(GAA). See FIG. 1.

Although tRNAs bearing unmodified A in the first position of theanticodon are known to read codons ending with C or U (Inagaki et al.,J. Mol. Biol. 251: 486, 1995; Chen et al., J. Mol. Biol. 317: 481, 2002;Boren et al., J. Mol. Biol. 230: 739, 1993), the binding of E. colitRNA^(Phe) _(GAA) at UUC should dominate that of tRNA^(Phe) _(AAA),owing to differences in the stability of A-C and G-C base pairs (seeabove).

We prepared a modified yeast tRNA^(Phe) (ytRNA^(Phe) _(AAA)) with analtered anticodon loop. The first base (G34) of the tRNA^(Phe) _(GAA)was replaced with A to provide specific Watson-Crick base-pairing to theUUU codon. Furthermore, G37 in the extended anticodon site was replacedwith A to increase translational efficiency (see Furter, Protein Sci. 7:419, 1998). We believe that charging of ytRNA^(Phe) _(AAA) by E. coliPheRS can be ignored, because the aminoacylation rate of ytRNA^(Phe)_(AAA) by E. coli PheRS is known to be <0.1% of that of E. colitRNA^(Phe) _(GAA) (Peterson and Uhlenbeck, Biochemistry 31: 10380,1992).

Since wild-type yeast PheRS does not activate amino acids significantlylarger than phenylalanine, a modified form of the synthetase withrelaxed substrate specificity was prepared to accommodateL-3-(2-naphthyl)alanine (Nal).

The modified yeast PheRS (mu-yPheRS) was prepared by introduction of aThr415Gly mutation in the α-subunit of the synthetase (Datta et al., J.Am. Chem. Soc. 124: 5652, 2002). The kinetics of activation of Nal andPhe by mu-yPheRS were analyzed in vitro via the adenosinetriphosphate-pyrophosphate exchange assay. The specificity constant(k_(cat)/K_(M)) for activation of Nal by mu-yPheRS was found to be1.55×10⁻³ (s⁻¹ M⁻¹), 8-fold larger than that for Phe. Therefore, whenthe ratio of Nal to Phe in the culture medium is high, ytRNA^(Phe)_(AAA) should be charged predominantly with Nal.

Example II Generation of a Mutant Protein Containing Nal

Murine dihydrofolate reductase (mDHFR), which contains nine Pheresidues, was chosen as the test protein. The expression plasmid pQE16encodes mDHFR under control of a bacteriophage T5 promoter; the proteinis outfitted with a C-terminal hexahistidine (HIS₆) tag to facilitatepurification via immobilized metal affinity chromatography.

In this construct, four of the Phe residues of mDHFR are encoded by UUCcodons, five by UUU. A full-length copy of the mu-yPheRS gene, undercontrol of a constitutive tac promoter, was inserted into pQE16. Thegene encoding ytRNA^(Phe) _(AAA) was inserted into the repressor plasmidpREP4 (Qiagen) under control of the constitutive promoter 1 pp. E. colitransformants harboring these two plasmids were incubated inPhe-depleted minimal medium supplemented with 3 mM Nal and were thentreated with 1 mM IPTG to induce expression of mDHFR. Although the E.coli strain (K10-F6) used in this study is a Phe auxotroph, (see Furter,supra) a detectable level of mDHFR was expressed even under conditionsof nominal depletion of Phe, probably because of release of Phe throughturnover of cellular proteins. In negative control experiments, mDHFRwas expressed in the absence of either ytRNA^(Phe) _(AAA) or mu-yPheRS.The molar mass of mDHFR prepared in the absence of Nal, ytRNA^(Phe)_(AAA), or mu-yPheRS was 23,287 Da, precisely that calculated forHIS-tagged mDHFR. However, when ytRNA^(Phe) _(AAA) and mu-yPheRS wereintroduced into the expression strain and Nal was added to the culturemedium, the observed mass of mDHFR was 23,537 Da (yield 2.5 mg/L afterNi-affinity chromatography). Because each substitution of Nal for Pheleads to a mass increment of 50 Da, this result is consistent withreplacement of five Phe residues by Nal. No detectable mass shift wasfound in the absence of either ytRNA^(Phe) _(AAA) or mu-yPheRS,confirming that the intact heterologous pair is required forincorporation of Nal. For mDHFR isolated from the strain harboring theheterologous pair, amino acid analysis indicated replacement of 4.4 ofthe 9 Phe residues by Nal. Without ytRNA^(Phe) _(AAA) or mu-yPheRS, noincorporation of Nal into mDHFR was detected by amino acid analysis.

Tryptic digests of mDHFR were analyzed to determine the occupancy ofindividual Phe sites. Digestion of mDHFR yields peptide fragments thatare readily analyzed by MALDI mass spectrometry as shown in FIG. 2.Peptide 1_(UUU) (residues 184-191, YKFEVYEK, SEQ ID NO: 1) contains aPhe residue encoded as UUU, whereas peptides 2_(UUC) (residues 62-70,KTWFSIPEK, SEQ ID NO: 2) and 3_(UUC) (residues 26-39, NGDLPWPPLRNEFK,SEQ ID NO: 3) each contain a Phe residue encoded as UUC. In the absenceof Nal, peptide 1_(UUU) was detected with a monoisotopic mass of 1105.55Da, in accord with its theoretical mass (FIG. 2A). However, when Nal wasadded, a strong signal at a mass of 1155.61 Da was detected, and the1105.55 was greatly reduced in intensity (FIG. 2B). As describedearlier, each substitution of Nal for Phe leads to a mass increase of50.06 Da; the observed shift in mass is thus consistent with replacementof Phe by Nal in response to the UUU codon. Liquid chromatography—tandemmass spectrometry (LC/MS/MS) confirmed this assignment. The ratio ofMALDI signal intensities, though not rigorously related to relativepeptide concentrations, suggests that Nal incorporation is dominant atthe UUU codon.

Similar analyses were conducted for peptides 2_(UUC) and 3_(UUC). In theabsence of added Nal, the observed masses of peptides 2_(UUC) and3_(UUC) are 1135.61 (FIG. 2A) and 1682.89 Da (FIG. 2D), respectively, asexpected. Upon addition of Nal to the expression medium, the 1135.61signal and 1682.89 signals were not substantially reduced, and only weaksignals were observed at masses of 1185.60 and 1733.03 (FIGS. 2B and2E), which would be expected for peptides 2_(UUC) and 3_(UUC) containingNal. Nal incorporation thus appears to be rare at UUC codons under theconditions used here for protein expression.

There is at least a formal possibility that the observed codon-biasedincorporation of Nal might be dependent on codon context rather than, orin addition to, codon identity. MALDI sampling errors are also possible.To test these possibilities, a mutant mDHFR gene was prepared bymutating the UUU codon in peptide 1_(UUU) to UUC, and the UUC codon inpeptide 3_(UUC) to UUU. In the resulting peptide 1_(UUC), the signalindicating incorporation of Nal was only slightly above background (FIG.2C), whereas for peptide 3_(UUU), Nal is readily detected (FIG. 2F). Nalincorporation is unambiguously codon-biased to UUU.

The results described here show conclusively that a heterologous paircomprising a genetically engineered tRNA and cognate aminoacyl-tRNAsynthetase can be used to break the degeneracy of the genetic code in E.coli.

Example III Application to Degenerate Leucine-Encoding Codons

In this example, multiple-site-specific incorporation of an unnaturalamino acid into murine dihydrofolate reductase (mDHFR) in response to asense codon was realized by use of an E. coli strain outfitted with ayeast transfer RNA (ytRNA^(phe) _(CAA)) capable of Watson-Crickbase-pairing with the leucine (Leu) codon UUG. ytRNA^(phe) _(CAA) wascharged with L-3-(2-naphthyl)alanine (Nal) by a co-expressed modifiedyeast phenylalanine tRNA synthetase. See schematic diagram in FIG. 3.Mass spectrometric analysis of tryptic digests of mDHFR showed that theUUG codon was partially re-assigned to Nal, whereas the other five Leucodons remained assigned to Leu.

Incomplete occupancy of the UUG codon by Nal is due at least in part tocompetition with leucine-charged E. coli tRNA^(Leu)s. In an attempt toreduce competition by E. coli tRNA^(Leu)s, use of a mutant E. colistrain lacking tRNA^(Leu) _(cAA) and addition of an E. coli leucyl-tRNAsynthetase (LeuRS) inhibitor were tested. A Phe/Leu double auxotrophicstrain derived from the tRNA^(Leu) _(cAA)-deficient strain XA106 (CGSCat Yale) was tested for incorporation of Nal at the UUG codon.Introduction of ytRNA^(Phe) _(cAA) into a mutant host lacking tRNA^(Leu)_(cAA) did not enhance the occupancy of the UUG sites by Nal, consistentwith earlier proposals that E. coli tRNA^(Leu) _(cAA) is rarely involvedin protein translation (Holmes, W. M.; Goldman, E.; Miner, T. A.;Hatfield, G. W. Proc. Natl. Acad. Sci. USA 74: 1393-1397, 1977).4-Aza-DL-leucine (AZL) is a competitive inhibitor of E. coli LeuRS, anddoes not progress to the azaleucyl-adenylate in vitro. It resulted inenhanced occupancy of the UUG codon by Nal. The results described heredemonstrate conclusively that the concept of breaking the degeneracy ofthe genetic code is quite general.

Replacement of Leu by Nal was detected in MALDI mass spectra of trypticfragments of mDHFR (FIG. 4). Peptide 1_(UUG) (residues 145-162,IMQEFESDTFFPEIDL_(UUG)GK, SEQ ID NO: 4) contains a Leu residue encodedby UUG, whereas Peptide 1_(UUG) (Nal) refers to the form of the peptidecontaining Nal in place of Leu. Peptides 2_(UUG) (residues 3-25,GSGIMVRPL_(UUG)NSIVAVSQNMGIGK, SEQ ID NO: 5), and 4_(CUG) (residues54-61, QNL_(CUG)VIMGR, SEQ ID NO: 6) were designated similarly. Peptide3_(UUG/UUA) (residues 99-105, SL_(UUG)DDAL_(UUA)R, SEQ ID NO: 7)contains two Leu residues encoded as UUG and UUA, respectively, whilePeptide 3_(UUA/UUA) contains two Leu residues encoded as only UUA. Uponaddition of Nal, the masses of peptide fragments 1-3 shift by 84.06(1_(UUG)), 83.89 (2_(UUG)), and 84.18 (3_(UUA/UUA)) mass units,respectively, as expected for replacement of Leu by the larger Pheanalog (Nal). The tandem mass spectrum of Peptide 3_(UUG/UUA) (Nal)confirmed that only the Leu encoded by UUG was replaced by Nal.Furthermore, Nal incorporation was not detected when UUG was mutated toUUA in Peptide 3. No signal corresponding to Peptide 4_(CUG) (Nal) wasdetected, whereas that corresponding to Peptide 4_(CUG) was detected at904.54 mass units. These data confirm that incorporation of Nal isstrongly biased to UUG.

Replacement of Leu by Nal was detected in MALDI mass spectra of trypticfragments of mDHFR expressed in tRNA^(Leu) _(CAA)-harboring E. coli (a)and tRNA^(Leu) _(CAA)-deficient E. coli (b). Peptide 3_(UUG/UUA)(residues 99-105, SL_(UUG)DDAL_(UUA)R, SEQ ID NO: 7) contains two Leuresidues encoded as UUG and UUA, respectively. Upon addition of Nal, themasses of these fragments shift in accord with the mass differencebetween Nal and Leu, indicating that incorporation had occurred.

FIG. 5 shows the effect of AZL on replacement of Leu by Nal wasevaluated by MALDI mass spectra of tryptic fragments of mDHFR. Peptide5_(UUG/UUG) (residues 26-35, NGDL_(UUG)PWPPL_(UUG)R, SEQ ID NO: 8)contains two Leu residues encoded as UUG. Upon addition of Nal, themasses of these fragments shift in accord with the mass differencebetween Nal and Leu. Only Nal (a), Nal and 1 mM AZL (b) weresupplemented into the media.

The practice of the present invention will employ, unless otherwiseindicated, conventional techniques of molecular biology, cell biology,cell culture, microbiology and recombinant DNA, which are within theskill of the art. Such techniques are explained fully in the literature.See, for example, Molecular Cloning: A Laboratory Manual, 2^(nd) Ed.,ed. By Sambrook, Fritsch and Maniatis (Cold Spring Harbor LaboratoryPress: 1989); DNA Cloning, Volumes I and II (D. N. Glover ed., 1985);Oligonucleotide Synthesis (M. J. Gait ed., 1984); Mullis et al.; U.S.Pat. No. 4,683,195; Nucleic Acid Hybridization (B. D. Hames & S. J.Higgins eds. 1984); Transcription And Translation (B. D. Hames & S. J.Higgins eds. 1984); B. Perbal, A Practical Guide To Molecular Cloning(1984); the treatise, Methods In Enzymology (Academic Press, Inc.,N.Y.); Methods In Enzymology, Vols. 154 and 155 (Wu et al. eds.),Immunochemical Methods In Cell And Molecular Biology (Mayer and Walker,eds., Academic Press, London, 1987).

The contents of all cited references (including literature references,issued patents, published patent applications as cited throughout thisapplication) are hereby expressly incorporated by reference.

EQUIVALENTS

Those skilled in the art will recognize, or be able to ascertain usingno more than routine experimentation, numerous equivalents to thespecific method and reagents described herein, including alternatives,variants, additions, deletions, modifications and substitutions. Suchequivalents are considered to be within the scope of this invention andare covered by the following claims.

1. A polynucleotide encoding a modified tRNA, wherein said modified tRNAcomprises a modified anticodon sequence that forms Watson-Crickbase-pairing with a wobble degenerate codon for a natural amino acid. 2.The polynucleotide of claim 1, wherein the interaction between themodified tRNA and the wobble degenerate codon at 37° C. is at leastabout 1.0 kcal/mole more favorable than the interaction between thewild-type tRNA and the wobble degenerate codon.
 3. The polynucleotide ofclaim 1, wherein said modified tRNA is derived from tRNA^(Phe), saidwobble degenerate codon is UUU, and said unnatural amino acid isL-3-(2-naphthyl)alanine (Nal).
 4. The polynucleotide of claim 1, whereinsaid modified tRNA further comprises a mutation at the fourth, extendedanticodon site for increasing translation efficiency.
 5. A method forincorporating an unnatural amino acid into a target protein at one ormore specified position(s), the method comprising: (1) providing to atranslation system a first polynucleotide of claim 1, or the modifiedtRNA encoded thereby; (2) providing to the translation system a secondpolynucleotide encoding a modified AminoAcyl tRNA Synthetase (AARS) withrelaxed substrate specificity, or the modified AARS, wherein themodified AARS is capable of charging the modified tRNA with saidunnatural amino acid; (3) providing to the translation system theunnatural amino acid; (4) providing to the translation system a templatepolynucleotide encoding the target protein, wherein the codon(s) on thetemplate polynucleotide for said specified position(s) formsWatson-Crick base-pairing with the modified tRNA; and, (5) allowingtranslation of the template polynucleotide, thereby incorporating theunnatural amino acid into the target protein at the specifiedposition(s), wherein steps (1)-(4) are effectuated in any order.
 6. Themethod of claim 5, wherein the translation system is a cell.
 7. Themethod of claim 5, wherein step (3) is effectuated by contacting thetranslation system with a solution containing the unnatural amino acid.8. The method of claim 7, wherein the unnatural amino acid: is an analogof said natural amino acid; or is an analog of at least one amino aciddifferent from said natural amino acid; or is not an analog of anynatural amino acids; or comprises a side-chain R group selected from:alkyl-, aryl-, acyl-, keto-, azido-, hydroxyl-, hydrazine, cyano-,halo-, hydrazide, alkenyl, alkynl, ether, thiol, seleno-, sulfonyl-,borate, boronate, phospho, phosphono, phosphine, heterocyclic, enone,imine, aldehyde, ester, thioacid, hydroxylamine, amino group, or thelike or any combination thereof; or comprises a photoactivatablecross-linker, or is a spin-labeled amino acid, a fluorescent amino acid,a metal-binding amino acid, a metal-containing amino acid, a radioactiveamino acid, an amino acid with novel functional group(s), an amino acidthat covalently or noncovalently interacts with other molecules, aphotocaged and/or photoisomerizable amino acid, an amino acidscomprising biotin or a biotin analog, a glycosylated amino acidcomprising a sugar-substituted serine, a carbohydrate-modified aminoacid, a keto-containing amino acid, an amino acid comprisingpolyethylene glycol or polyether, a heavy atom-substituted amino acid, achemically cleavable and/or photocleavable amino acid, an amino acidswith an elongated side-chain as compared to natural amino acids, acarbon-linked sugar-containing amino acid, a redox-active amino acid, anamino thioacid-containing amino acid, or an amino acid comprising one ormore toxic moiety; or is represented by Formula II or III:

wherein Z comprises —OH, —NH₂, —SH, —NH—R′, or S—R′; X and Y, which maybe the same or different, comprise S or O, and R and R′, which may bethe same or different, are selected from: alkyl-, aryl-, acyl-, keto-,azido-, hydroxyl-, hydrazine, cyano-, halo-, hydrazide, alkenyl, alkynl,ether, thiol, seleno-, sulfonyl-, borate, boronate, phospho, phosphono,phosphine, heterocyclic, enone, imine, aldehyde, ester, thioacid,hydrogen, hydroxylamine, amino group, or the like or any combinationthereof; or is selected from: α-hydroxy acids, α-thioacidsα-aminothiocarboxylates; or is L, D, or α-α-disubstituted amino acidselected from D-glutamate, D-alanine, D-methyl-O-tyrosine, oraminobutyric acid; or comprises a functional group selected from:bromo-, iodo-, ethynyl-, cyano-, azido-, acetyl, aryl ketone,photolabile, fluorescent, or heavy metal group; or is a cyclic aminoacid selected from: a 3-, 4-, 6-, 7-, 8-, and 9-membered ring prolineanalog; a β or γ amino acid selected from substituted β-alanine orγ-amino butyric acid; or is a Tyrosine analog selected from: apara-substituted tyrosine, an ortho-substituted tyrosine, ameta-substituted tyrosine, wherein the substituted tyrosine comprises anacetyl group, a benzoyl group, an amino group, a hydrazine, anhydroxyamine, a thiol group, a carboxy group, an isopropyl group, amethyl group, a C6-C20 straight chain or branched hydrocarbon, asaturated or unsaturated hydrocarbon, an O-methyl group, a polyethergroup, a nitro group, or multiply substituted aryl rings; a Glutamineanalog selected from: α-hydroxy derivatives, β-substituted derivatives,cyclic derivatives, or amide-substituted glutamine derivatives; aPhenylalanine analog selected from: meta-substituted phenylalanines,wherein the substituent comprises a hydroxy group, a methoxy group, amethyl group, an allyl group, an acetyl group, or the like; or is anO-methyl-L-tyrosine, an L-3-(2-naphthyl)alanine, a3-methyl-phenylalanine, an O-4-allyl-L-tyrosine, a 4-propyl-L-tyrosine,a tri-O-acetyl-GlcNAcβ-serine, an L-Dopa, a fluorinated phenylalanine,an isopropyl-L-phenylalanine, a p-azido-L-phenylalanine, ap-acyl-L-phenylalanine, a p-benzoyl-L-phenylalanine, an L-phosphoserine,a phosphonoserine, a phosphonotyrosine, a p-iodo-phenylalanine, ap-bromophenylalanine, a p-amino-L-phenylalanine, or anisopropyl-L-phenylalanine; or modifies one or more biological propertiesof a protein into which it is incorporated, said biological propertiescomprising: toxicity, biodistribution, solubility, thermal stability,hydrolytic stability, oxidative stability, resistance to enzymaticdegradation, facility of purification and processing, structuralproperties, spectroscopic properties, chemical and/or photochemicalproperties, catalytic activity, redox potential, half-life, ability toreact with other molecules either covalently or noncovalently.
 9. Themethod of claim 5, wherein said modified AARS with relaxed substratespecificity charges said modified tRNA with said unnatural amino acid.10. The method of claim 9, wherein the specificity constant(k_(cat)/K_(M)) for activation of said unnatural amino acid by saidmodified AARS is at least 5-fold larger than that for said natural aminoacid.
 11. The method of claim 5, wherein said modified tRNA is chargedby an endogenous AARS at a rate no more than 1% of that of its cognatetRNA.
 12. The method of claim 5, wherein the unnatural amino acid isprovided by introducing additional nucleic acid construct(s) into thetranslation system, wherein the additional nucleic acid construct(s)encode one or more proteins required for biosynthesis of the unnaturalamino acid.
 13. The method of claim 5, wherein the first polynucleotideand/or the second polynucleotide further comprises either aconstitutively active or an inducible promoter sequence that controlsthe expression of the modified tRNA or AARS, respectively.
 14. Themethod of claim 5, wherein the translation system is a cell, and thecell is auxotrophic for the natural amino acid encoded at the specifiedposition.
 15. The method of claim 5, wherein the translation system: (1)lacks endogenous tRNA that forms Watson-Crick base-pairing with thecodon(s) at said specified position(s); (2) is a cell, and the methodfurther comprises disabling one or more genes encoding any endogenoustRNA that forms Watson-Crick base-pairing with the codon(s) at saidspecified position(s); or (3) is a cell, and the method furthercomprises inhibiting one or more endogenous AARS that charges tRNAs thatform Watson-Crick base-pairing with the codon(s) at said specifiedposition(s).
 16. The method of claim 5, wherein the cell is a bacterialcell, an E. coli cell, an insect cell, a mammalian cell, a fungal cell,or a yeast cell.
 17. The method of claim 5, wherein the translationsystem is a cell, and the modified tRNA and/or the modified AARS arederived from an organism different from that of the cell.
 18. The methodof claim 5, further comprising verifying the incorporation of theunnatural amino acid.
 19. The method of claim 5, wherein the analog isincorporated into the position at an efficiency of at least about 50%.20. A translation system comprising the polynucleotide of claim
 1. 21.The translation system of claim 20, further comprising a secondpolynucleotide encoding a modified AARS with relaxed substratespecificity, or the modified AARS, wherein the modified AARS is capableof charging the modified tRNA with an unnatural amino acid.
 22. Thetranslation system of claim 20, comprising more than two differentpolynucleotides of claim 1, each said polynucleotides capable ofcarrying a different unnatural amino acid.
 23. The translation system ofclaim 20, which is a cell.
 24. The translation system of claim 23,wherein the modified tRNA is from an organism different from that of thecell.
 25. The translation system of claim 24, wherein the modified tRNAis from a yeast, and the cell is an E. coli bacterium.
 26. Thetranslation system of claim 23, wherein the modified AARS and the tRNAare from the same organism, said organism is different from that of thecell.
 27. The translation system of claim 24, wherein the modified AARSand the tRNA are from a yeast, and the cell is an E. coli bacterium.